How to Debug a Failing LangGraph Agent with OTLP Traces
Step-by-step guide to diagnosing infinite loops, failing tool calls, and token blowout in LangGraph, CrewAI, and OpenAI Agents SDK agents using OTLP span data.
When a LangGraph agent fails silently, loops indefinitely, or burns through your token budget in a single run, the log output rarely tells you enough. What you need is the OTLP trace — the structured record of every span the agent emitted, including parent-child relationships, durations, token counts, and status codes.
This post walks through three of the most common failure patterns and shows exactly which span data to look at for each one.
What you need before you start
Export the trace from LangSmith, Langfuse, or any OpenTelemetry Collector file exporter as OTLP JSON (the resourceSpans format). In LangSmith: open a run → Export → OTLP. In Langfuse: open a trace → Export → JSON. Paste it into ContextIQ's Agent Trace Inspector to render the graph and timeline — or read the raw JSON directly if you prefer.
All examples below use the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes). If your trace uses LangSmith vendor attributes (langsmith.span.kind) instead, see the note at the end.
Failure pattern 1: the infinite loop
What it looks like in the logs
ResearchAgent iteration 1
ResearchAgent iteration 2
ResearchAgent iteration 3
...
Run terminated after 120s
What the OTLP trace shows
A looping agent produces repeated spans with the same gen_ai.agent.name value, each with incrementing start times. In the span tree:
langchain.workflow
└── ResearchAgent (span 1, 0ms–1200ms)
└── ResearchAgent (span 2, 1250ms–2600ms)
└── ResearchAgent (span 3, 2650ms–4100ms)
...
Each iteration is a sibling span, not a child. The key diagnostic signal is the callCount — how many spans share the same gen_ai.agent.name. A healthy ReAct agent loops 2–4 times. Anything above 6 without a termination condition is a loop.
The root cause in the span data
Look at the chat openai or chat anthropic child spans inside each agent iteration. Specifically, look at the output token count (gen_ai.usage.output_tokens) on the LLM call spans. Two patterns cause loops:
- Low output tokens (< 50): the model is returning a bare tool call with no reasoning. It's calling the same tool repeatedly because nothing in the prompt forces it to stop.
- Consistent output tokens (~same every iteration): the model is stuck in a pattern. The context is not changing enough between iterations to change the decision.
How to fix it
Inspect the gen_ai.request.model attribute on the looping LLM spans. If it's gpt-4o-mini or claude-haiku-4-5, consider upgrading to a stronger model for the planning step. If it's already a capable model, the issue is in your stopping condition — the should_continue edge in your LangGraph graph needs a stricter check on iteration_count or tool_result_quality.
Failure pattern 2: a tool call that errors silently
What it looks like in the logs
Tool call: web_search
Tool call: web_search
Final answer: [empty]
What the OTLP trace shows
Spans carry an OTLP status code. code: 1 is OK; code: 2 is ERROR. A silently failing tool call will show:
{
"spanId": "abc123",
"name": "web_search",
"attributes": [
{ "key": "gen_ai.tool.name", "value": { "stringValue": "web_search" } }
],
"status": { "code": 2, "message": "ConnectionError: timeout after 5000ms" }
}
The agent span above it (gen_ai.agent.name = "ResearchAgent") may still show code: 1 (OK) because the agent caught the exception and continued — it just got an empty result back from the tool.
Finding it fast
Sort the spans by status code. Any span with status.code === 2 is an error. In the Agent Trace Inspector, errored nodes render with a red border and ⚠ badge so you don't have to scan manually.
The actual fix
The status.message field on the failing span usually contains the raw exception. Common causes:
| Tool error message | Root cause |
|---|---|
ConnectionError: timeout | External API unreachable, or timeout too short |
ValidationError: unexpected field | Tool schema mismatch — the model output doesn't match what the tool expects |
AuthenticationError: 401 | Credentials missing or rotated |
RateLimitError | Upstream API rate limit — add backoff or a fallback |
Once you fix the tool, re-run and compare the new trace. The error should disappear from that span; if the agent now produces a different output, you've confirmed the tool call was the failure point.
Failure pattern 3: unexpected token cost
What it looks like in the logs
Run completed in 4.2s
Total tokens: 38,420
That's higher than expected. The logs won't tell you which agent or which LLM call is responsible.
What the OTLP trace shows
Each span that represents an LLM call carries gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. To find the expensive spans, look for:
- Spans where
input_tokensis unusually high. This means the context window at that point was large — usually because a tool returned a very long string that was passed directly into the prompt. - Spans deep in the hierarchy with high
input_tokens. An agent that calls a tool, gets a 50,000-character response, and passes the full response to the next LLM call will produce a deeply nested span with input token counts in the thousands.
Example trace structure for a token blowout
ResearchAgent (input: 1,240 tokens)
├── web_search (tool call — returns 48,000 chars)
└── chat openai (input: 18,700 tokens ← the problem span)
The web_search tool returned raw HTML or a long document, which the agent passed verbatim to the next LLM call. The chat openai span's gen_ai.usage.input_tokens makes this visible immediately.
Fixing it
The fix depends on what the tool returns. If it's a web scraper or document reader, add a summarization step before passing the result to the next LLM call. If it's a database query, filter the result to only the fields the agent needs. The trace tells you exactly which tool and which LLM call are the bottleneck — you don't need to guess.
A note on LangSmith vendor attributes
Not all traces use the gen_ai.* convention. Traces exported from LangSmith directly — rather than via an OpenTelemetry Collector — may use langsmith.span.kind instead of gen_ai.agent.name, and langsmith.run.name for the node name. The Agent Trace Inspector normalizes both automatically, so you can paste either format and get the same graph. If your trace renders blank, the likely cause is a vendor namespace that isn't covered yet — check the attributes array on a few spans to see what keys are present.
Summary
| Failure pattern | Span signal to look for |
|---|---|
| Infinite loop | Multiple spans with same gen_ai.agent.name, high callCount |
| Silent tool error | status.code === 2 on a tool span, check status.message |
| Token blowout | High gen_ai.usage.input_tokens on an LLM call span deep in the tree |
All three patterns are visible in the raw OTLP JSON. The span hierarchy, status codes, and token attributes are designed exactly for this kind of post-mortem. You don't need a dedicated observability platform — just the trace and something that can read it.
Follow Trango Compute on LinkedIn
We post updates on new tools, context engineering patterns, and LLM cost research.