How to Read OTLP Traces from LangGraph, CrewAI, and OpenAI Agents SDK
A guide to the OpenTelemetry GenAI semantic conventions — gen_ai.agent.name, gen_ai.tool.name, gen_ai.usage.input_tokens — and how to use them to debug LangGraph ReAct loops, CrewAI pipelines, and OpenAI Agents handoffs.
When a LangGraph ReAct loop, a CrewAI pipeline, or an OpenAI Agents SDK handoff goes wrong in production, the first thing you need is the trace. Most agent frameworks now emit OpenTelemetry spans following the GenAI semantic conventions, which means the structure of the trace is predictable — if you know what to look for.
The four attributes that matter
Every span in an agent trace carries attributes. The GenAI semantic conventions define a small set that describes the AI-specific work:
| Attribute | Meaning |
|---|---|
gen_ai.agent.name | The agent that produced this span (e.g. ResearchAgent, TriageAgent) |
gen_ai.tool.name | The tool that was called (e.g. web_search, lookup_account) |
gen_ai.request.model | The model used for this LLM call (e.g. gpt-4o, claude-3-5-sonnet-20241022) |
gen_ai.system | The framework or provider (langgraph, crewai, openai_agents, anthropic) |
gen_ai.usage.input_tokens | Input tokens consumed by this LLM call |
gen_ai.usage.output_tokens | Output tokens produced |
Spans without gen_ai.agent.name or gen_ai.tool.name are typically raw LLM calls (a chat openai span, a chat anthropic span). These get emitted by the instrumentation library whenever the agent makes an inference call, and they're children of the agent span that triggered them.
How span parentage maps to agent structure
The parent-child relationship between spans is how you read the agent graph. A LangGraph ReAct agent produces a span structure like this:
langchain.workflow (root — no parentSpanId)
└── ResearchAgent (gen_ai.agent.name = "ResearchAgent")
├── web_search (gen_ai.tool.name = "web_search")
├── chat openai (bare LLM call — no agent name)
├── calculator (gen_ai.tool.name = "calculator")
└── chat openai (bare LLM call — no agent name)
The bare chat openai spans don't have gen_ai.agent.name, but their parent is ResearchAgent. That means all tokens on those spans belong to ResearchAgent. To get the true token cost of the agent, you sum the tokens across the agent span itself and every descendant LLM call span.
For an OpenAI Agents SDK handoff, the transferred agent becomes a child span of the handing-off agent:
agents.Runner.run
└── TriageAgent (gen_ai.agent.name = "TriageAgent")
├── chat openai (TriageAgent's LLM call)
└── BillingAgent (gen_ai.agent.name = "BillingAgent" — handoff)
├── lookup_account (gen_ai.tool.name = "lookup_account")
└── chat openai (BillingAgent's LLM call)
The edge TriageAgent → BillingAgent is the handoff. The span hierarchy makes the transfer explicit: BillingAgent.parentSpanId points to TriageAgent.
Getting the trace out of LangSmith or Langfuse
Both LangSmith and Langfuse can export traces as OTLP JSON. In LangSmith, open a run → export → choose OTLP format. In Langfuse, navigate to a trace → export → JSON. The exported file follows the standard resourceSpans structure with scopeSpans and spans arrays. Any service using the OpenTelemetry Collector can also write traces to a file exporter in the same format.
Once you have the JSON, paste it into ContextIQ's Agent Trace Inspector to render the agent graph, see per-node token attribution, and inspect the timeline of every span — no SDK setup required.
What the trace tells you that logs don't
A flat log stream shows you what happened; a trace shows you why it happened in that order. Three things are only visible in the trace:
- Which agent consumed the most tokens — not the run total, but the per-agent breakdown including every LLM call it triggered
- Tool call latency vs LLM latency — a 3-second run is very different if it's one slow tool call vs three sequential LLM calls
- Handoff depth — how many times control transferred between agents, and whether any agent ran more than once (a loop)
For CrewAI sequential crews, the trace shows ResearcherAgent and WriterAgent as siblings under the crewai.kickoff root span, with each agent's tool calls nested beneath it. The timeline makes it immediately obvious which agent was the bottleneck.
Follow Trango Compute on LinkedIn
We post updates on new tools, context engineering patterns, and LLM cost research.