I've built LLM applications both ways: using LangChain as the orchestration layer, and building custom orchestration from scratch. My AI Gymbro fitness app started with LangChain and was rewritten as custom orchestration six weeks in.
LangChain is a Python/JavaScript framework that provides abstractions for common LLM patterns: chains (sequences of LLM calls), agents (LLM + tools + loop), retrieval (vector store integration), and memory (conversation history management). The value proposition: a unified interface across multiple LLM providers, built-in implementations of common patterns, and a growing ecosystem of integrations.
Every abstraction layer adds debugging distance between your code and the underlying API. When a LangChain chain fails, you get a stack trace through 5 layers of LangChain internals before you see the actual API error. I spent 3 hours debugging a SequentialChain that was silently dropping outputs.
LangChain is valuable for three use cases: rapid prototyping (get a RAG pipeline running in 50 lines of Python vs 200+ lines custom), multi-provider support (switch between OpenAI, Anthropic, and Bedrock without code changes), and complex agent workflows via LangGraph.
LangChain vs Custom: Decision Framework
Start
│
▼
Do you need to switch LLM providers often?
├── Yes → LangChain (unified interface)
└── No → Continue...
│
▼
Is your workflow > 4 states with branching?
├── Yes → LangGraph
└── No → Continue...
│
▼
Do you have > 5 integrations (vector stores,
document loaders, custom tools)?
├── Yes → LangChain ecosystem
└── No → Continue...
│
▼
How many LLM calls per user request?
├── 1-3 → Custom orchestration (simpler, faster)
└── 4+ → Evaluate LangGraph
│
▼
Do you need sub-200ms latency?
├── Yes → Custom (no abstraction overhead)
└── No → Either works
Verdict:
Simple app (1-3 calls, < 5 integrations) → Custom
Complex agent (many states, multi-provider) → LangChain/LangGraphIf you're using LangChain, integrate LangSmith from day one. It captures full trace data — every LLM call, every chain step, every token count — and makes debugging dramatically easier. The free tier is generous enough for development and small-scale production.
Build custom orchestration when: your workflow is simple (1-3 LLM calls per user request); you need precise control over token usage and caching; you need streaming output with specific UI behavior; or your team is more comfortable debugging JavaScript/Python than framework internals.
LangGraph is worth separating from LangChain proper. The graph-based workflow model (nodes are actions, edges are state transitions) is genuinely useful for complex multi-step agent workflows that have branching paths, conditional execution, and state that needs to persist across multiple turns.
# LangChain RAG — 50 lines, fast to prototype
from langchain_anthropic import ChatAnthropic
from langchain_community.vectorstores import PGVector
from langchain.chains import RetrievalQA
llm = ChatAnthropic(model="claude-3-5-haiku-20241022")
vectorstore = PGVector.from_existing_index(connection_string=DB_URL)
chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever())
result = chain.invoke({"query": "What exercises target the lats?"})
# Custom RAG — 200 lines, full control
async def custom_rag(query: str) -> str:
# 1. HyDE — generate hypothetical answer for better embedding
hyp_answer = await llm.generate(f"Write a brief answer to: {query}")
embedding = await embed(hyp_answer)
# 2. Vector search with metadata filter
chunks = await db.search(embedding, filter={"type": "exercise"}, limit=20)
# 3. Rerank
reranked = await cohere_rerank(query, chunks, top_n=5)
# 4. Generate with precise prompt + caching
context = "
".join(c.content for c in reranked)
return await llm.generate(
system=CACHED_SYSTEM_PROMPT, # prompt caching
user=f"Context:
{context}
Question: {query}"
)
# Custom is more work but: 30% fewer tokens, 25% lower latency, easier to debugUse LangChain/LangGraph if: you're building a RAG application and want fast iteration on retrieval strategies; you need multi-provider support; you're building complex stateful agent workflows; or your team already knows LangChain. Skip LangChain if: your workflow has fewer than 3 LLM calls; you need precise token control; or debuggability is critical.
LangChain has had numerous breaking changes between major versions. If you build production systems on LangChain, pin your versions strictly in requirements.txt and budget time for version migration work every 6-12 months. I've seen teams lose entire weekends to LangChain upgrade problems.
For a new LLM project in 2025, I'd start with direct API calls using Anthropic or OpenAI's official SDK (both are excellent), add LangSmith for observability (it works without LangChain), and introduce LangGraph only if the workflow complexity genuinely warrants it.
One genuine advantage of LangChain in 2025 is the ecosystem: 600+ integrations, a large community, and LangSmith for observability. If you need to integrate with specific vector databases, LangChain's pre-built connectors save real time. The question is whether the ecosystem value outweighs the abstraction cost for your specific use case.