Free tool · No sign-up required
Token Inspector —
compare cost across every LLM.
Paste any text — a system prompt, document, few-shot examples, or conversation — and instantly compare token counts and API costs across GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, DeepSeek V3, Llama 3.1, and 20+ more models in a single view.
Models covered
How it works
Exact counts where possible
Uses tiktoken (o200k_base for GPT-4o, cl100k_base for GPT-3.5/4) and BPE tokenization to give exact counts for OpenAI models — not estimates. Other models use character-ratio approximations clearly marked with ~.
Real cost estimates at scale
Set expected output tokens and daily request volume. The table calculates cost per request and projected monthly spend for every model — so the GPT-4o vs Claude Sonnet vs Gemini Flash decision becomes a spreadsheet, not a guess.
Custom model support
Add any model not in the default list — set your own input/output price per million tokens. Useful for private deployments, fine-tuned models, or newly released models before the list updates.
Who uses it
Prompt engineers
Paste a system prompt draft and instantly see how many tokens it consumes — before you ship it to production at 10,000 requests/day.
AI product teams
Compare the cost of running your feature on Claude 3.5 Sonnet vs GPT-4o mini vs Gemini Flash. Make the model selection decision on data, not vibes.
RAG pipeline builders
Measure how many tokens your retrieved chunks consume and whether they fit inside GPT-4o's 128k context window or Gemini 1.5 Pro's 1M window.
Tokenization accuracy
OpenAI models (GPT-4o, GPT-4, GPT-3.5) use tiktokenrunning client-side in the browser — exact counts, no API call required. Anthropic, Google, Meta, and other models use character-ratio approximations; these are clearly labeled. All pricing data is updated manually and may not reflect the latest rate changes — always verify against the provider's pricing page before committing to production spend.