从“代币最大化”到“代币恐慌”：Citrini 警告人工智能的“金发姑娘”叙事正碰壁

从“代币最大化”到“代币恐慌”：Citrini 警告人工智能的“金发姑娘”叙事正碰壁
From Token-maxxing To Token-panic: Citrini Warns AI Goldilocks Narrative Hitting A Wall

原始链接: https://www.zerohedge.com/ai/token-maxxing-token-panic-citrini-says-ai-goldilocks-narrative-hitting-wall

人工智能行业正从肆无忌惮的“代币至上”时代，转向陷入深度“代币恐慌”的时期。在代币用量爆炸式增长的推动下，企业在经历了大规模投入后开始碰壁，高昂的运营成本和实验室激进的货币化手段正不断消耗 IT 预算。这一转变的迹象日益明显：Uber 等公司在几个月内就耗尽了 AI 预算；OpenAI 和微软等巨头的高层也已确认，成本效率已成为客户的首要考量。为了维持利润率，Anthropic、Google 和 OpenAI 等主要供应商已转向基于使用量的计费模式，并经常通过不透明的“分词器”调整来推高成本。 “免费 AI”时代即将终结，AI 基础设施的补贴负担正从风险投资转向企业损益表。这一现实迫使企业进行反思：企业正在重新评估昂贵的前沿模型是否具有投资回报率，这导致市场对高性价比开源替代方案及小型专用模型的兴趣激增。虽然前沿 AI 在处理高风险任务时仍具价值，但市场重心已转向效率、可观测性和本地部署，这标志着 AI 行业的下一阶段将由成本管控而非盲目扩张来定义。

原文

When the world and their pet rabbit was buying the hype and extrapolating trends to infinity and beyond, we dared to highlight a few 'economic' realities of the new 'tokenomics'.

From Singularity To Tokenomics: The AI Narrative Just Hit A Serious Snag
Was Amazon's Tokenmaxxing Fiasco Behind Claude's $500M Mystery Bill?
From Singularity To Tokenomics, Part II: The Subsidy Just Ran Out - And GitHub Users Went Splat

This morning we got confirmation of this AI reality questioning from none other than Goldman Sachs Partner, Rich Privorotsky, who highlighted that Token Spend had 'peaked'...

And now, Citrini Research - who infamously issued a less than utopic view of the world under AI back in March - has written a follow up on the status quo of the AI ecosystem, noting that in just weeks we’ve gone from tokenmaxxing to tokenpanic.

In March, we and many others were writing about the astounding growth in token consumption driven by the release of agents and more intensive models.

This was enough to send the infrastructure trade sharply higher – the market value of the semiconductor industry doubled in two months.

But that goldilocks narrative is beginning to hit a wall. The corollary of explosive token usage is explosive cost to customers, which is coming just as the US labs and hyperscalers are turning up the dial on monetization. The public story is increasingly turning to corporate pushback.

The first real signs of this shift were from the much-discussed report of Uber burning through its entire AI budget in just four months.

Then there was the anonymous report of a $500 million oopsie.

In the past week, the idea has turned into a media avalanche.

According to The Economist’s reporting, Anthropic’s ARR has increased 5x since the start of the year, reaching $45 billion in May.

Great for the lab, but it also means the “AI Opex” line item on P&Ls is going through the roof.

The issue is not just Anthropic. Sam Altman also confirmed that all of a sudden cost is a huge issue (and acknowledged the virality of the idea).

“Probably the second biggest theme is just around cost. People are really saying, it’s kind of become a meme now, but, “My company spent my entire 2026 budget in Q1. Can you make this more efficient?” We are continuing to push on that more with models. I think we’ll have a lot of ways we can help people get more value for less spend, but that went from, at the beginning of this year, an issue that never came up. I know. People were totally happy with the amount they were spending, to all of a sudden, a huge issue.”

Microsoft’s AI Chief added to the unflattery this week after cancelling Claude Code licenses in May.

“Anthropic is extremely expensive, and I think many people are urgently looking for alternatives”

This cost concern didn’t just come out of nowhere.

First, agents and more advanced reasoning models use orders of magnitude greater tokens.

Corporates have widely distributed these tools and encouraged their use just as the average user was gaining the ability to casually run enormous bills.

Second, prices for frontier models are increasing as providers are flipping to usage models and preparing for public market debuts.

In a unified front – OpenAI, Anthropic, Microsoft, and Google – have all implemented pricing shifts towards usage/tokens, as they simply can’t afford to endlessly subsidize their products for power users.

April 2: OpenAI changed Codex pricing to align with API token usage instead of per-message pricing
May 19: Google changed Gemini subscriptions from “daily prompt limits” to a “compute-used” model.
June 1: Microsoft’s GitHub Copilot transitioned to usage based billing

And what does a rate sheet mean really if you have no idea what your usage burns in practice?

Claude’s Opus 4.7 & 4.8 have the same “list price” as prior versions, but use a “new tokenizer” that may use up to 35% more tokens for the same fixed text.

Is this an existential problem or just the VC playbook at unimaginable scale?

Subsidize demand, gain market share and lock-in, then monetize. After all, companies are spending a trillion in capex to make trillions in revenue, right?

Well either way we’ve reached Monetization, and maybe not by choice. As fast as lab revenue is growing, the fundraising has grown even faster.

The money going towards building and running AI has exploded. The deepest pockets in the world – hyperscaler cash flow, venture capital, sovereign wealth, public credit, private credit, public equity – are footing most of the bill. Eventually, customers have to start picking up the tab.

Free-AI is ending. Tokenomics is beginning.

What happens when underlying costs of compute become more transparent and directly traceable to outcomes? The ROI debate is about to be answered in real time, across millions of users and use cases.

For the median user, maybe not a whole lot changes. But science projects, freewheeling agents, and curiosities will either get cut or offloaded to open source models. Companies will restrict AI functionality and invest in oversight and observability. Budget constraints will pit AI spend against headcounts. Providers will become more competitive on pricing and will begin to optimize physical and digital architecture for efficiencies.

In many (most) situations, good enough will do. The cost of running open-source, discount, or mini models is going down while their capabilities only improve. This week saw another batch of open source models like Nvidia latest Nemotron family which includes advanced general-purpose models as well as highly efficient, compact versions optimized for local deployment and specialized agentic uses. As the frontier continues to advance, inference costs drop precipitously for a fixed level of intelligence. Why rent a Ferrari when a Vespa does the trick?

Of course, frontier models with highly specialized functions can continue to command an intense premium, but will serve a smaller segment of the market. A top lawyer can still bill at thousands per hour, even if millions of other workers are making minimum wage.

But even across the high end, the gap between US and Chinese offerings is worth noting. Qwen 3.7 and Deepseek V4 are still behind Opus 4.8 and GPT 5.5 in terms of benchmarks, but they are 10x - 25x cheaper.

Since releasing V4 Pro and V4 Flash in April, Deepseek has shot past Anthropic to the top of the charts on OpenRouter in terms of tokens processed.

Meanwhile, Cursor, one of the most used coding agents, released their new model that was post-trained on compute provided by xAI after their $10 billion deal. The base model is a different Chinese open source model by Moonshot and it was trained on data Cursor gets from its customers. The results are even stronger than Deepseek, it’s comparable to 4.7 and 5.5 for 10x lower cost per task and is one of the fastest frontier models.

There are obvious other “considerations” for large US enterprises that may prevent a mass exodus to Chinese alternatives. Plus, greater integration into workflows adds to lock-in. But there is a growing trend of application layer companies that will continue to post-train on open source base models for specialized workflows like coding and legal.

But what does this mean for the AI trade?

First, to be clear, revenues for labs and hyperscalers are going to grow. Token usage for top Anthropic models continues to go higher. Regardless of the pushback, frontier models can certainly create meaningful value especially in high-stakes fields like tech and finance, and there are still plenty of levers to pull in the monetization phase. The entire point is for them to start making money.

Likewise, this won’t fix near-term compute constraints.

But we do think that cost and efficiency only become more important as the bills get bigger. Themes of local inference, miniaturization, smart routing, observability, price competition, and efficient model architecture will grow. Competitive pressures and price competition are likely to stay.

Subscribers can read the rest of Citrini's note here...

从“代币最大化”到“代币恐慌”：Citrini 警告人工智能的“金发姑娘”叙事正碰壁 From Token-maxxing To Token-panic: Citrini Warns AI Goldilocks Narrative Hitting A Wall

从“代币最大化”到“代币恐慌”：Citrini 警告人工智能的“金发姑娘”叙事正碰壁
From Token-maxxing To Token-panic: Citrini Warns AI Goldilocks Narrative Hitting A Wall