05 Deep Dive March 2026
How AI Companies Are Charging You More Without You Even Realizing It
You pay for what you use. That's the deal. Except it's not.
When you use an AI model — GPT-4, Claude, Gemini — you do not pay per word. You pay per token. And that tiny technical detail is quietly costing you, depending on which company you choose, up to 60% more for the exact same request.
60% Extra cost for non-English speakers
420× Price gap between cheapest & priciest model
0 Standardization across providers
What Is a Token, Really?
Before we get to the money, a crash course. Tokens are not words. They are subword units produced by a compression algorithm called BPE (Byte Pair Encoding) — originally a data-compression technique, repurposed for NLP in the 2010s. The algorithm learns frequent character sequences in a corpus and groups them into single vocabulary entries.
The catch: every AI company trains its own tokenizer on its own corpus with its own vocabulary size. The result is that the same word gets sliced differently depending on who's counting:
OpenAI · tiktoken
"unbelievable"
un believ able
Total tokens 3
Google · SentencePiece
"unbelievable"
▁un believable
Total tokens 2
Anthropic · Proprietary
"unbelievable"
un be liev able
Total tokens 4
Same word. Three different prices. The bill you receive depends not on what you said — but on which tokenizer counted it.
The Dirty Secret — Tokens Are Not Standardized
There is no ISO standard for AI tokens. No regulatory body. No published audit. Each major provider uses a different system:
OpenAI → tiktoken (cl100k_base / o200k_base) ~100k vocab Google → SentencePiece (older) + custom (Gemini) ~256k vocab Anthropic → Proprietary — barely documented ~?? vocab Meta LLaMA → BPE ~32k vocab Mistral → Custom BPE ~32k vocab
Anthropic's tokenizer is particularly opaque. There is no public specification, no open-source release, and the documentation amounts to a single paragraph in their pricing FAQ. You are billed by a black box.
The Language Tax
The most damaging consequence of non-standardized tokenization is what we call the Language Tax. English — specifically American English — was the dominant language in most training corpora. As a result, English tokenizes efficiently. Every other language pays a premium.
| Language | Tokens / Word | Overhead vs English | Relative Cost |
|---|---|---|---|
| English | baseline | 1.0× | |
| Spanish | +62% | 1.6× | |
| French | +54% | 1.5× | |
| German | +62% | 1.6× | |
| Russian | +154% | 2.5× | |
| Arabic | +208% | 3.1× | |
| Hindi | +392% | 4.9× |
A Spanish speaker pays 60% more tokens for the same content. A Hindi speaker pays nearly 5× more. The pricing page lists the same dollar rate per million tokens — but the number of tokens you consume is quietly different depending on your language.
The Pricing War
On top of tokenization differences, the pricing gap between providers has exploded. As of March 2026:
| Provider / Model | Input $/M | Output $/M | Note |
|---|---|---|---|
| Google Gemini Flash-Lite | $0.10 | $0.40 | Cheapest viable |
| Google Gemini 2.5 Pro | $1.25 | $10 | Strong value |
| OpenAI GPT-4o | $3 | $10 | Mainstream |
| Anthropic Claude Opus 4.6 | $5 | $25 | Standard |
| Anthropic Claude Opus 4.6 (Fast) | $30 | $150 | Speed premium |
| OpenAI GPT-5.2 Pro (projected) | $21 | $168 | Most expensive |
💸 420× Price Gap
Between GPT-5.2 Pro output ($168/M) and Gemini Flash-Lite ($0.40/M), there is a 420× price difference — for models both marketed as "AI assistants." The gap is real, and growing.
Same Prompt, Different Bill
Let's make this concrete. Take a real-world agent task: 100-word user message + 500-word system prompt + 200-word response. English vs Spanish, same content:
English Spanish Difference
──────────────────────────────────────────────────────────
User message (100w) ~ 130 tok ~ 210 tok
System prompt (500w) ~ 650 tok ~ 1,050 tok
Response (200w) ~ 260 tok ~ 404 tok
─────────── ───────────
TOTAL ~ 1,040 tok ~ 1,664 tok +60%
At Claude Opus 4.6 rates:
English: ~$0.0052 (input) + ~$0.0065 (output)
Spanish: ~$0.0083 (input) + ~$0.0101 (output)
Extra monthly cost for a Spanish-language app: significant.
This is not a rounding error. At scale — millions of agent calls per month — the language tax becomes a serious cost factor, and most teams discover it only after they've already committed to a provider and a language.
When Token Became Fake Currency
This pattern has happened before. When cloud computing emerged in the 2000s, every major provider invented their own unit of compute: AWS had EC2 hours, Azure had Credits, Google had Compute Units. Each defined differently. Each deliberately opaque. Comparison required a spreadsheet — and that friction always benefited the seller.
AI has recreated the same opacity with tokens. A "token" from OpenAI is not the same as a "token" from Anthropic, which is not the same as a "token" from Google. They share a name and nothing else.
The uncomfortable truth: Tokens are a brilliant business model. Abstract enough that most users don't think deeply about them. Defined differently by every player. Non-comparable by design. And confusion, in markets with asymmetric information, always benefits the seller.
The Solution: TokensTree
We built TokensTree precisely because this problem is structural — it won't be fixed by any single provider, because it's in their interest to maintain the fog. The answer has to be infrastructural.
Two mechanisms address this directly:
SafePaths with Remote Cache: Verified command paths are stored once and reused across agents. The first agent that solves a problem pays the full token cost. Every subsequent agent retrieves the cached result for a fraction of the tokens. Like Bazel build caching for AI knowledge — repeated computations are cached, shared, and reused. Token consumption drops. Latency drops. The language of the requesting agent becomes irrelevant to the token cost of the stored answer.
Cross-provider token accounting: TokensTree normalizes token counts across providers, so you can see what a task actually costs — not what each provider's tokenizer claims it costs. One dashboard. Real comparisons. No fog.
Every 1B tokens saved = 1 tree planted. When token efficiency is the mission, not just a talking point, the incentives align differently. We save tokens because it matters — for cost, for access equity, and for the planet.
If the language tax is the toll you pay at every call, tokenstree.eu is the route optimizer that finds the cheapest crossing before your prompt even reaches the tokenizer. It intercepts requests automatically — translating them into the most BPE-efficient encoding, sending them to the model, then returning the response in your language. Your French stays French. Your Spanish stays Spanish. The token count drops in the middle. That is what fighting the fog looks like in practice.
TokensTree is building the infrastructure for a more efficient AI economy. Token pricing data reflects publicly available rates as of March 2026 and is subject to change. Language tax ratios are approximate averages across common use cases, not guarantees for specific inputs. tokenstree.com