Trango ComputeContextIQPreview
LangGraphOTLPOpenTelemetrydebuggingLangSmithCrewAIOpenAI Agents SDK

How to Debug a Failing LangGraph Agent with OTLP Traces

Step-by-step guide to diagnosing infinite loops, failing tool calls, and token blowout in LangGraph, CrewAI, and OpenAI Agents SDK agents using OTLP span data.

June 23, 2026Trango Compute Inc.

When a LangGraph agent fails silently, loops indefinitely, or burns through your token budget in a single run, the log output rarely tells you enough. What you need is the OTLP trace — the structured record of every span the agent emitted, including parent-child relationships, durations, token counts, and status codes.

This post walks through three of the most common failure patterns and shows exactly which span data to look at for each one.

What you need before you start

Export the trace from LangSmith, Langfuse, or any OpenTelemetry Collector file exporter as OTLP JSON (the resourceSpans format). In LangSmith: open a run → Export → OTLP. In Langfuse: open a trace → Export → JSON. Paste it into ContextIQ's Agent Trace Inspector to render the graph and timeline — or read the raw JSON directly if you prefer.

All examples below use the OpenTelemetry GenAI semantic conventions (gen_ai.* attributes). If your trace uses LangSmith vendor attributes (langsmith.span.kind) instead, see the note at the end.


Failure pattern 1: the infinite loop

What it looks like in the logs

ResearchAgent iteration 1
ResearchAgent iteration 2
ResearchAgent iteration 3
...
Run terminated after 120s

What the OTLP trace shows

A looping agent produces repeated spans with the same gen_ai.agent.name value, each with incrementing start times. In the span tree:

langchain.workflow
└── ResearchAgent   (span 1, 0ms–1200ms)
└── ResearchAgent   (span 2, 1250ms–2600ms)
└── ResearchAgent   (span 3, 2650ms–4100ms)
    ...

Each iteration is a sibling span, not a child. The key diagnostic signal is the callCount — how many spans share the same gen_ai.agent.name. A healthy ReAct agent loops 2–4 times. Anything above 6 without a termination condition is a loop.

The root cause in the span data

Look at the chat openai or chat anthropic child spans inside each agent iteration. Specifically, look at the output token count (gen_ai.usage.output_tokens) on the LLM call spans. Two patterns cause loops:

  • Low output tokens (< 50): the model is returning a bare tool call with no reasoning. It's calling the same tool repeatedly because nothing in the prompt forces it to stop.
  • Consistent output tokens (~same every iteration): the model is stuck in a pattern. The context is not changing enough between iterations to change the decision.

How to fix it

Inspect the gen_ai.request.model attribute on the looping LLM spans. If it's gpt-4o-mini or claude-haiku-4-5, consider upgrading to a stronger model for the planning step. If it's already a capable model, the issue is in your stopping condition — the should_continue edge in your LangGraph graph needs a stricter check on iteration_count or tool_result_quality.


Failure pattern 2: a tool call that errors silently

What it looks like in the logs

Tool call: web_search
Tool call: web_search
Final answer: [empty]

What the OTLP trace shows

Spans carry an OTLP status code. code: 1 is OK; code: 2 is ERROR. A silently failing tool call will show:

{
  "spanId": "abc123",
  "name": "web_search",
  "attributes": [
    { "key": "gen_ai.tool.name", "value": { "stringValue": "web_search" } }
  ],
  "status": { "code": 2, "message": "ConnectionError: timeout after 5000ms" }
}

The agent span above it (gen_ai.agent.name = "ResearchAgent") may still show code: 1 (OK) because the agent caught the exception and continued — it just got an empty result back from the tool.

Finding it fast

Sort the spans by status code. Any span with status.code === 2 is an error. In the Agent Trace Inspector, errored nodes render with a red border and ⚠ badge so you don't have to scan manually.

The actual fix

The status.message field on the failing span usually contains the raw exception. Common causes:

Tool error messageRoot cause
ConnectionError: timeoutExternal API unreachable, or timeout too short
ValidationError: unexpected fieldTool schema mismatch — the model output doesn't match what the tool expects
AuthenticationError: 401Credentials missing or rotated
RateLimitErrorUpstream API rate limit — add backoff or a fallback

Once you fix the tool, re-run and compare the new trace. The error should disappear from that span; if the agent now produces a different output, you've confirmed the tool call was the failure point.


Failure pattern 3: unexpected token cost

What it looks like in the logs

Run completed in 4.2s
Total tokens: 38,420

That's higher than expected. The logs won't tell you which agent or which LLM call is responsible.

What the OTLP trace shows

Each span that represents an LLM call carries gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. To find the expensive spans, look for:

  • Spans where input_tokens is unusually high. This means the context window at that point was large — usually because a tool returned a very long string that was passed directly into the prompt.
  • Spans deep in the hierarchy with high input_tokens. An agent that calls a tool, gets a 50,000-character response, and passes the full response to the next LLM call will produce a deeply nested span with input token counts in the thousands.

Example trace structure for a token blowout

ResearchAgent       (input: 1,240 tokens)
├── web_search      (tool call — returns 48,000 chars)
└── chat openai     (input: 18,700 tokens ← the problem span)

The web_search tool returned raw HTML or a long document, which the agent passed verbatim to the next LLM call. The chat openai span's gen_ai.usage.input_tokens makes this visible immediately.

Fixing it

The fix depends on what the tool returns. If it's a web scraper or document reader, add a summarization step before passing the result to the next LLM call. If it's a database query, filter the result to only the fields the agent needs. The trace tells you exactly which tool and which LLM call are the bottleneck — you don't need to guess.


A note on LangSmith vendor attributes

Not all traces use the gen_ai.* convention. Traces exported from LangSmith directly — rather than via an OpenTelemetry Collector — may use langsmith.span.kind instead of gen_ai.agent.name, and langsmith.run.name for the node name. The Agent Trace Inspector normalizes both automatically, so you can paste either format and get the same graph. If your trace renders blank, the likely cause is a vendor namespace that isn't covered yet — check the attributes array on a few spans to see what keys are present.


Summary

Failure patternSpan signal to look for
Infinite loopMultiple spans with same gen_ai.agent.name, high callCount
Silent tool errorstatus.code === 2 on a tool span, check status.message
Token blowoutHigh gen_ai.usage.input_tokens on an LLM call span deep in the tree

All three patterns are visible in the raw OTLP JSON. The span hierarchy, status codes, and token attributes are designed exactly for this kind of post-mortem. You don't need a dedicated observability platform — just the trace and something that can read it.

Try ContextIQ free

Free tools for AI engineers.

Follow Trango Compute on LinkedIn

We post updates on new tools, context engineering patterns, and LLM cost research.

Follow on LinkedIn