LLM API Cost Calculator — compare 36 models in seconds

Paste a prompt to count tokens, dial in your traffic, and see exactly what GPT, Claude, Gemini, DeepSeek, Llama, Mistral and more will cost you per month — with a sortable table, value scores, a context-window fit finder, shareable scenarios and CSV export.

1 · Estimate your prompt & set your traffic

Token counts use a client-side BPE-approximate estimator (±10–15%). Nothing you paste ever leaves your browser.

est. tokens

words

characters

Requests / day

Avg input tokens / request

Avg output tokens / request

—

requests / month

—

input tokens / mo

—

output tokens / mo

—

cheapest monthly

2 · Cheapest model that fits your context window

Your context need = input + output tokens per request (plus headroom for history or RAG chunks). Models that don't fit are dimmed in the table below.

I need at least tokens of context auto from sliders

3 · Side-by-side model comparison

Click any column to sort. Prices are standard API list rates in USD per 1M tokens — estimates as of June 2026.

fits context only

Model	Tier	In $/1M	Out $/1M	Context	Quality*	Monthly	Value

✎ Spotted a stale price? Suggest a correction

Monthly spend by provider

Cheapest model vs. highest-quality model per provider at your current traffic (log scale).

Cut your AI bill

Affiliate

TraceStack

Teams report 15–30% savings

LLM observability that surfaces your ten most expensive prompts and flags silent token bloat in production.

Try TraceStack free →

Affiliate

CacheWarp

Cache hits cost ~$0

Semantic caching proxy: serve repeat and near-duplicate questions from cache instead of paying for fresh generations.

Start caching →

Affiliate

ModelMux

Route 60% of traffic to cheap models

A smart router that sends easy requests to budget models and reserves frontier models for the hard ones.

Route smarter →

📬 Monthly AI pricing digest

One email a month: every price change across 13 providers, plus the updated CSV. No spam, unsubscribe anytime.

Get the digest →

How LLM API pricing actually works

Every major LLM provider bills the same way: you pay separately for input tokens (everything you send — system prompt, conversation history, retrieved documents) and output tokens (everything the model writes back). Output tokens are typically 3–8× more expensive than input tokens, which is why a chatty model with long answers can quietly cost several times more than a terse one at identical request volume. A token is roughly four characters of English text, or about three-quarters of a word; code, non-Latin scripts and unusual punctuation tokenize less efficiently, so budget extra headroom for those workloads.

Monthly cost is simple arithmetic once you know three numbers: requests per day, average input tokens per request, and average output tokens per request. TokenTally multiplies those out over a 30-day month against each model's per-million-token rates. The biggest savings levers, in rough order of impact: shorten your system prompt (it's resent on every request), cap output length, use prompt caching for repeated prefixes (most providers discount cached input 50–90%), batch non-urgent jobs (typically 50% off), and route easy requests to a cheaper tier instead of sending everything to a frontier model.

Frequently asked questions

How accurate is the token counter?

It's a BPE-approximate estimator that runs entirely in your browser — it mimics how modern tokenizers split words, numbers, punctuation and CJK characters, and is typically within ±10–15% of the real count for English prose. Each provider uses a slightly different tokenizer (o200k, Claude's tokenizer, SentencePiece variants), so even "exact" counts differ between models. For billing-critical work, use the provider's official token-counting endpoint.

Where do the prices come from, and how fresh are they?

Prices are standard pay-as-you-go API list rates in USD per million tokens, compiled from public provider pricing pages and dated in the badge at the top of the page. They are estimates: providers change rates frequently, and the table excludes batch discounts, cached-input rates, long-context surcharges, and negotiated enterprise pricing. If you spot a stale number, use the "suggest a correction" link under the table.

What is the value score?

Value = our editorial quality estimate (0–100, weighted heavily) divided by your per-request cost at the current slider settings, normalized so the best model in view scores 100. It rewards capable-but-cheap models and updates live as you move the sliders — a model that wins at 400 output tokens may lose at 4,000. Quality estimates are editorial judgments blending public benchmarks and community evals, not an official benchmark.

Why do output tokens cost so much more than input tokens?

Generating a token requires a full forward pass through the model, sequentially, one token at a time — while input tokens are processed in parallel in a single pass. Output is therefore far more compute-intensive per token, and pricing reflects that. Practical upshot: setting a sensible max_tokens and asking for concise answers is one of the highest-leverage cost optimizations available.

How big a context window do I actually need?

Add up: system prompt + conversation history you keep + retrieved documents + the user's message + the maximum response you allow. For a typical chatbot that's 4K–32K tokens; for RAG over long documents, 64K–200K; for whole-codebase or book-length work, 500K+. Note that many models get slower and slightly less accurate near their context limit, and some providers charge premium rates above a threshold — so "fits" isn't the same as "optimal". The finder above highlights the cheapest model that clears your requirement.

How can I cut my LLM bill without changing models?

Five proven levers: (1) trim your system prompt — at 1,000 requests/day, every 100 tokens removed saves ~3M input tokens a month; (2) enable prompt caching for stable prefixes; (3) use batch APIs for anything that can wait an hour (usually 50% off); (4) cap and compress outputs — ask for bullet points, not essays; (5) add a router or cascade so cheap models handle the easy 60–80% of traffic. Observability tools (see sidebar) help you find which prompts are actually burning the budget.

Is my pasted prompt sent anywhere?

No. TokenTally is a fully static page — the tokenizer, calculator, chart and CSV export all run client-side in your browser. There is no backend, no analytics on your prompt text, and nothing is transmitted when you type or paste.

Estimates only. All prices, quality scores and projections on this page are editorial estimates for planning purposes — not quotes, not financial advice, and not affiliated with any model provider. Verify current pricing on each provider's official page before committing to a budget. Affiliate links may earn us a commission at no cost to you.