LangSmith

Published by Dusan Belic on May 13, 2024

LangSmith by LangChain

Categories Coding & Development Enterprise

An online tool that helps developers get their Large Language Model app from prototype to production

LangSmith is an online tool that helps developers get their Large Language Model (LLM) app from prototype to production. It is an all-in-one DevOps platform for every step of the LLM-powered application lifecycle. In other words, LangSmith is made to help with developing, collaborating, testing, deploying, and monitoring LLM applications.

The problem is that while LLM-apps are powerful, they have peculiar characteristics. The non-determinism, coupled with unpredictable, natural language inputs, make for countless ways the system can fall short. Traditional engineering best practices need to be re-imagined for working with LLMs, and that’s where LangSmith kicks in to support all phases of the development lifecycle.

It offers full visibility into the entire sequence of calls, so that developers can spot the source of errors and performance bottlenecks in real-time with surgical precision. They can debug, experiment, observe and repeat — until they’re happy with the results.

LangSmith also lets developers collaborate with their teammates to get app behavior just right. And finally, the platform supports testing and AI-assisted evaluations, with off-the-shelf and custom evaluators that can check for relevance, correctness, harmfulness, insensitivity, and more.

As of May 2024, LangSmith has more than 100K users signed up, 200M+ traces logged, and 20K+ monthly active teams.

LangSmith Homepage

Categories Coding & Development Enterprise

Video Overview ▶️

What are the key features? ⭐

Traces: Easily share a chain trace with colleagues, clients, or end users, bringing explainability to anyone with the shared link.
Hub: LangSmith Hub lets you craft, version, and comment on prompts. No engineering experience required.
Annotation Queues: LangSmith Annotation Queues is used to add human labels and feedback on traces.
Datasets: Easily collect examples and construct datasets from production data or existing sources. Datasets can be used for evaluations, few-shot prompting, and even fine-tuning.
Test & evaluate: Measure quality over large test suites. Layer in human feedback on runs or use AI-assisted evaluation with off-the-shelf and custom evaluators that can check for relevance, correctness, harmfulness, insensitivity, and more.

Who is it for? 🤔

LangSmith is made for developers to help them get their LLM from prototype to production, supporting them every step along the way. As a unified DevOps platform, it lets developer teams develop, collaborate, test, deploy and monitor their LLM applications. As a result, it provides a great visibility of the development process, thus making it easier to manage.

Examples of what you can use it for 💭

Collaborate with teammates to get app behavior just right
Quickly save debugging and production traces to datasets, which are collections of either exemplary or problematic inputs and outputs
Use an LLM and prompt to score your application output, or write your own functional evaluation tests
See how the performance of the evaluation criteria that you've defined is affected by changes to your application
Track qualitative characteristics of any live application and spot issues in real-time with LangSmith monitoring.

Pros & Cons ⚖️

A unified DevOps platform for your LLM applications
Helps developers deliver LLM-based software fast and easy
Makes it easier to manage the complexity of LLM software

It's not a magic wand, someone still has to do the work

FAQs 💬

What is LangSmith?

LangSmith is a unified platform for building, testing, evaluating, and monitoring large language model (LLM) applications. It offers complete visibility into agent behavior through tracing, real-time dashboards, and insights, helping developers debug issues and improve performance without adding latency to apps.

What are the main features of LangSmith?

Key features include step-by-step tracing for debugging, live monitoring of metrics like costs and latency, automatic insights into usage patterns, prompt experimentation, dataset management for evaluations, and collaboration tools for teams. It integrates seamlessly with frameworks like LangChain via a single environment variable.

Who is LangSmith best for?

It's ideal for developers and teams building production-grade AI agents, especially those using LangChain or LangGraph. Beginners prototyping apps might find it useful too, but it's geared toward anyone needing observability in non-deterministic LLM workflows, from startups to enterprises.

How does LangSmith integrate with LangChain?

LangSmith works hand-in-hand with LangChain for easy tracing and evaluation. Just set one environment variable, and it captures every step of your chains or agents, making it simple to monitor and debug without extra code.

What are LangSmith's pricing plans?

Plans start with a free Developer tier (5k traces/month, 1 seat), move to Plus at $39/user/month (10k traces/month, up to 10 seats), and include a Startup discount for early-stage teams. Enterprise offers custom pricing with self-hosting; traces cost extra based on retention (basic: $0.50/1k for 14 days, extended: $4.50/1k for 400 days).

Can I self-host LangSmith?

Yes, on the Enterprise plan, you can deploy it on your Kubernetes cluster in AWS, GCP, or Azure to keep data in your environment. It's not available on lower tiers, though, so teams prioritizing data privacy often go this route.

What are some alternatives to LangSmith?

Popular options include Langfuse (open-source, self-hostable for tracing and evals), Helicone (model-layer observability with a gateway), Braintrust (UI-driven evals for non-technical teams), and Phoenix (focus on explainability). Choose based on needs like cost or framework agnosticism.

Is LangSmith easy for beginners to use?

It has a learning curve if you're new to LLM observability, but quickstarts and tutorials make setup straightforward, like creating an API key and tracing a simple chain. I think most devs get value fast, especially with its visual playground for testing prompts.

How does LangSmith handle data privacy?

It doesn't train on your data, and you retain full ownership. Hosted on GCP us-central-1 by default, with SOC 2 Type 2, HIPAA, and GDPR compliance. Self-hosting ensures data never leaves your setup, which probably appeals to regulated industries.

What do users say about LangSmith's pros and cons?

Pros: Excellent for debugging agents, real-time metrics, and team collaboration; saves time on production issues. Cons: UI can feel clunky or slow at times, pricing adds up for high-volume traces, and it's tied closely to LangChain, which might limit flexibility for other stacks.