Tokenmaxxing 已死，Tokenmaxxing 万岁。

Tokenmaxxing 已死，Tokenmaxxing 万岁。
Tokenmaxxing is dead, long live tokenmaxxing

原始链接: https://12gramsofcarbon.com/p/agentics-tech-things-tokenmaxxing

“代币最大化”（Tokenmaxxing）——即大举投入资金使用 AI 代币——最初是高管们为强推员工采用 AI 工具而采取的一种粗暴手段。尽管随着成本攀升和补贴减少，这一阶段已有所降温，但其背后的逻辑正在演变为一种更可持续的新策略：“复合准确性”（compounding correctness）。与早期容易出错的智能体不同，现代模型在增加代币消耗后能产出更好的结果。这使得关注点从盲目消耗转向了“工作量证明”经济，尤其是在网络安全等领域——通过比攻击者投入更多算力来发现漏洞，已成为一种制胜策略。随着循环智能体工作流和“软件工厂”成为标配，在无需人工监督的情况下实现任务自动化的动力将再次回归。目前，市场正在从脆弱的、由咨询公司构建的智能体流水线，转向灵活的通用平台。虽然一些公司可能会暂时缩减开支，但长期趋势仍倾向于高容量的代币消耗，因为人工智能驱动的“黑灯工厂”（全自动化工厂）正朝着完全自主的方向发展。随着经济实惠的开源模型和专用硬件的兴起，对高吞吐量、智能体性能的优化需求，确保了“代币最大化”极有可能会卷土重来。

本次讨论探讨了“Tokenmaxxing”（代币最大化）这一概念，即通过激励员工最大程度地使用人工智能代币。一种观点认为，“Tokenmaxxing”是一种暂时的战略过渡手段，旨在迫使员工探索并将人工智能融入工作流程。一旦员工了解了该技术的能力与局限性，公司便可缩减开支。然而，其他评论者对企业决策持怀疑态度。他们认为，管理层往往在不了解实际效用的情况下盲目推行人工智能指令。此外，参与者质疑了企业领导层本质上精打细算的说法。持怀疑态度的人指出，高管经常在“虚荣”项目上浪费大量资金，例如多余的品牌推广活动、不必要的顾问咨询以及随意的品牌重塑；这些行为更多是出于职业姿态，而非真正的商业价值。归根结底，这场讨论凸显了将人工智能作为过渡工具的计算性使用与企业低效支出的现实之间的矛盾。

Generally speaking, if you spend tens of thousands of dollars on something, you want to see something come out on the other end. Some return on investment.

O sure, not always. I’ve previously said that selling to consumers is sorta funny because they love spending money on things that waste time or actively cause pain. This is part of why the gambling apps are so popular these days. Why yes, I’d love to spend $100 on betting that Wemby scores a 3 pointer while doing a handstand and singing the national anthem in French.

But for businesses? I’ve basically never heard a business leader say that they were going to set a bunch of money on fire because it made them feel good, at least not the same way a whale will spend thousands on Genshin Impact gatcha pulls. Like, imagine if some serious business leader, like, idk, Mark Zuckerberg, decided to announce that Meta was going to burn money. He could do that. He’s got the voting shares. But it would be a bit silly, wouldn’t it? I generally think if you’ve gotten to the point where you’re running really big really important companies, you mostly aren’t doing things for kicks, with one big exception.

If you haven’t heard, tokenmaxxing is (was?) a phenomenon where executives accidentally encouraged their employees to burn a bunch of tokens on useless tasks. The canonical example of this is, by complete coincidence, Meta, which has been thoroughly skewered for tying performance evaluations to the amount of token usage per person. Obviously, obviously this was going to lead to people just burning tokens on nothing. One of my friends at Meta reported that they literally would just have two agents talking to each other throughout the day to get her token numbers up.

This was such an obvious outcome that many people rounded this off as “these business leaders are really dumb because they decided to burn a bunch of money on tokens without expecting any return.”

I understand why that’s a tempting take, because that is kinda sorta what the public face of a lot of this was. But I’m going to do my favorite thing in the world, which is be a bit contrarian. It wasn’t that “executives accidentally encouraged their employees to burn a bunch of tokens on useless tasks.” Rather, “executives purposely encouraged their employees to burn a bunch of tokens on useless tasks.”

I work with a lot of teams on figuring out how to use AI effectively. A few months ago, there were a lot of people who were extremely resistant to using AI tools at all. Senior people, people that had a lot of respect in the organization. It was very difficult to convince these folks to use the tools. And when you did, they would often accidentally (or purposely?) use the tools in a way that would obviously lead to weird or bad outcomes.

One way to think about the top down tokenmaxxing policies is that this was a technique by executives to break through. Yes, it was obviously a blunt force policy, but sometimes you need blunt force to break through a wall.

Of course, that was the situation a few months ago, when there were still holdouts. It’s now a few months later, and the tokenmaxxing policies had their intended outcome: everyone is using AI to code, at least a little bit. Most teams haven’t yet figured out how to build their own Ramp Inspect or Stripe Minions (if that’s you, reach out — we can help!) but basically everyone is at least using cursor in the side bar. Which, of course, means that token spend has gone way up. Unfortunately, but probably not unexpectedly, the increase in token spend has lined up with both OpenAI and Anthropic trying to go public. Both companies have limited the amount of juice their subscriptions provide while jacking up their API pricing. Token subsidies are increasingly vanishing.

So now the incentives are mostly gone and the cost is way up and, of course, teams are starting to roll back their unlimited-token-spend policies. All of this to say, tokenmaxxing is dead.

Except…maybe not.

The promise of AI tools generally is that you can have them run without human supervision to accomplish really hard and really tedious tasks that still need to be done. The big code migration, doing research on all your competitors every morning, keeping up with the stream of inbound and outbound — these are all things that people mostly hate doing and want AI to do.

Up until recently, though, you couldn’t reliably have an AI run for long periods of time. If you tried, you would notice that small errors introduced by the models (including hallucinations) would take on a life of their own and eventually become irreversibly embedded into the project. In the business we called this “compounding error.” It not only required a fair bit of human supervision, it also kept token costs low because there was little benefit in running agents 24/7 to begin with. Like, what’s the point of running a little demon in your computer over night if the thing is just going to tear up all your hard work? If spending more tokens results in worse work, you obviously aren’t going to spend more tokens!

That’s no longer true. We’ve entered a different regime, where spending more tokens generally results in better results. We call this “compounding correctness” — the more tokens you spend on getting a task correct, the more likely you’ll get a good outcome. We talked about this a bit at the last in person Agentics meetup:

Compounding correctness flips the calculus. If more token spend leads to better outcomes, then you’re going to want to spend a lot of time running tokens. Which sure as hell sounds like tokenmaxxing to me! The original incentives to tokenmax are gone, but eventually folks will realize that a new and more powerful incentive has take its place.

We’ve already seen some of this take place in the cyber security world:

Last week we learned about Anthropic’s Mythos, a new LLM so “strikingly capable at computer security tasks” that Anthropic didn’t release it publicly. Instead, only critical software makers have been granted access, providing them time to harden their systems.
…
This chart suggests an interesting security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them.
AISI budgeted 100M tokens for each attempt. That’s $12,500 per Mythos attempt, $125k for all ten runs. Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.
If Mythos continues to find exploits so long as you keep throwing money at it, security is reduced to a brutally simple equation: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.
You don’t get points for being clever. You win by paying more. It is a system that echoes cryptocurrency’s proof of work system, where success is tied to raw computational work. It’s a low temperature lottery: buy the tokens, maybe you find an exploit. Hopefully you keep trying longer than your attackers.

Fable is, tragically, gone now. But the underlying concept here still remains.

This is also in part why people are suddenly so excited about ‘loops.’ Boris Cherny, the creator of Claude Code, got up on stage and said ‘loops’ and everyone freaked out. The basic idea behind loops is that you run an agent until it reaches the end of its turn, and then when it finishes you simply restart the same prompt. With a bit of cleverness you can take a pretty heavy specification and automatically have the agent split it into parts and solve it over time. No human supervision required.

Is this some new thing? No, not really. The loop concept has been around since literally last July. It used to be called a “Ralph Wiggum loop,” but as the industry has matured so has our sense of humor and the ‘Ralph Wiggum’ part was dropped.

r/TheSimpsons - What scene or moment makes you feel most painfully sorry for a character?

There were ways to get loops to work, but it was hard. You had to think a lot about how to prompt the agent, which in turn required a pretty deep familiarity with how these things work. Now, though, it’s easy. Compounding correctness makes it easy. You can basically prompt the LLM however you want and to a first approximation, it will do better every iteration of the loop.

So is tokenmaxxing really dead? Maybe temporarily, but long term I don’t think so. Teams that are at the cutting edge are currently building or have built the infrastructure necessary to run agents 24/7. It’s only a matter of time before the bigcos realize that the cost benefit has shifted again.

The real winners here are the open model platforms. Tokenmaxxing the top labs will never stand up to any amount of CFO scrutiny. As open models get better, it will become more popular to simply run those in a loop. That was the core thesis of Rohan’s talk above. If Claude gives you 1.1x improvement per iteration, and GLM 5.2 gives you 1.05 improvement per iteration but costs ~5x less, you can just run the second loop 5x more times and it will be better.

The last thing I want to mention here is that some of the ridiculous token spend is downstream of a serious misunderstanding of the best way to use these tools. Before coding agents really took off (thanks in large part to much better harnesses like Claude Code), lots of people were making their own custom agents. And that was legitimate work! You had to think about this stuff as if it was…well, software. There was an art to figuring out the tools and the prompts but the core of it was still just software, even if it was supported by ‘AI native’ frameworks like Pydantic or Langchain.

You can’t fit a square peg into a round hole. Executives across the board saw this style of building agents, went “o, this is just a more flexible zapier workflow,” and proceeded to demand data processing pipelines that could do one-off tasks that were ‘agentic’ instead of building those same pipelines in good ol’ deterministic code. ‘I need to do data labeling, so I will build a data labeling agent’, that sort of thing.

Now, relying on an agent to do some of this stuff is already going to be significantly more expensive than just doing a workflow automation. But the bigger issue is the accuracy: none of these ‘agents’ ever really took off, because they were never as correct as a deterministic pipeline would be.

If you’re committed to using agents but want to reduce the cost of hallucinations and things, what do you do? Why, you build another agent! A ‘quality checking’ agent, or something like that. And what if that agent gives you errors? Well you’ll just build another! And now you have 3x the token cost, enjoy!

The story of tokenmaxxing is, again, one of RoI. That story didn’t just play out at the bigtechcos. It also happened at a less advanced scale at companies all over the country — companies who poured billions into random agent pipelines built by one off consultants that unfortunately never really quite worked all that well.

Notice that these are actually two different kinds of tokenmaxxing.

The first kind is ‘spend a lot of money on tokens for your developers‘. Here, devs are using tools like Claude Code and figuring out how to run things in loops and using a lot of tokens to do it. Ostensibly this is a good use of money because it’s making the engineers themselves more productive.

The second is ‘spend a lot of money on tokens for your pipelines‘. Here, devs are still writing code by hand! They are using that code to create one-off agents to do very specific tasks often in a non-deterministic and brittle way, and it’s those agents that guzzle up all the tokens. This is only a good use of money if the pipelines work, which they don’t.

But here, too, we are seeing a shift. Increasingly, these sorts of one-off pipeline-based tools are better done by generalist platforms that are skinned for the specific task, than an “agent” specially designed to do that one task. There’s some market arbitrage here. Some buyers haven’t realized that generalist agents have gotten really good, so they will go to consultants asking to ‘build me an agent’, and the consultant essentially writes a skill file and says “that will be $2m please.”

Luckily, this too shall pass. Generalist model platforms are obviously the future for anyone who has used them (and if you haven’t, again, reach out!) And that, again, will lead to another rise in tokenmaxxing behavior in this part of the market.

The natural end state of all of this is the ‘software factory’ or, even further, the ‘dark factory’ — a codebase that pumps out code, reviews code, fixes bugs, writes tests, and so on without any human supervision. The human simply puts in a spec and out comes an application. The folks over at StrongDM have taken this to the furthest extreme, arguing that engineers should aim to spend $1000 in tokens per day. This is almost certainly hype, part of a long trend of saying egregious things to get coverage and buzz. We have a software factory, and we spend like $600 per month. But the hype and buzz comes about because, even though it is currently ridiculous to spend the price of a senior google engineer in tokens per engineer, there is a kernel of truth to this. The incentive to spend ludicrous amounts of money on tokens are there, latent, waiting for diffusion.

What’s old is new again and what’s dead may never die. Tokenmaxxing is dead, but we haven’t seen the last of tokenmaxxing just yet.

GPT 5.6 is out, kinda sorta. From the announcement:
We’re beginning a limited preview of the GPT‑5.6 series: Sol, our flagship model; Terra, a balanced model for everyday work; and Luna, a fast and affordable model. Terra has competitive performance to GPT‑5.5 while being 2x cheaper and Luna brings strong capability at our lowest cost.
…
We believe in broad access, and we plan to make GPT‑5.6 Sol, Terra, and Luna generally available in the coming weeks. As part of our ongoing engagement with the U.S. government, we previewed our plans and the models’ capabilities ahead of today’s launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly.
…
We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them. We are taking this short-term step because we believe it is the strongest path to broader availability in the coming weeks, while we work with the Administration to develop the cyber Executive Order framework and a repeatable process for future model releases.
Washington Post was even more aggressive in its analysis:
U.S. government will decide who gets to use latest upgrade to ChatGPT
The Trump administration came to power preaching a laissez-faire approach to AI but has lately increased oversight of the industry.
Reading between the lines, it looks like we are getting more government regulation of AI companies, but unfortunately the process as it currently stands is completely opaque. Reactions are mixed. On the one hand it’s great that the administration’s previous attacks on Anthropic are also being applied to their competition. On the other hand, it seems like the process is totally opaque, and now the government is unilaterally picking winners and losers not just among AI companies, but among every other industry where AI may matter (i.e. all of them). The companies who get to use OpenAI’s tooling (and Mythos, see below) are currently unknown. It would be very concerning / disappointing if we find out that it is exclusively companies with ties to this administration.
Related: Mythos is back on the table, at least sort of. From Semafor:
US releases powerful Anthropic model Mythos to some US companies
The US government Friday lifted its block on Anthropic’s powerful Claude Mythos 5 AI model, allowing the company to release it to more than 100 US institutions, including major companies and government agencies.
The decision, in a letter sent Friday afternoon to Anthropic, is a major de-escalation in the confrontation between the Trump Administration and one of the world’s most valuable private companies. Two weeks ago the administration imposed export controls on Mythos, leading to a shut down of the model and its cousin Fable 5 after warnings from Amazon and other companies that they could be “jailbroken” for malicious purposes.
The letter is silent on Fable 5, a weaker version of Mythos that was briefly the most powerful AI model widely available to consumers. People close to the talks said they are moving toward releasing Fable as well, though that timeline is unclear.
“I have determined that appropriate safeguards are in place to permit certain trusted partners to access the Claude Mythos 5 Model,” Commerce Secretary Howard Lutnick wrote to Anthropic’s chief compute officer Tom Brown Friday, citing “significant progress” in the intense, daily talks between the government and the company since the block went into effect.
Again, picking winners and losers.
While we are talking about OpenAI’s launch, one thing that was buried in the announcement was that the tooling would be available on Cerebras’s high-speed inference machines at ~750 tokens per second. This is quite fast. Right now we are in a regime where it makes sense to treat AI tools as async operators that can go off and do tasks without supervision. But that is mostly downstream of the fact that AI is slow and it takes a while to do things. If AI was really really fast you would plausibly go back towards a more synchronous model of operating. For an idea of what this may look like, check out https://chatjimmy.ai/. It’s just a demo, but wow, what a demo.
Open models like GLM 5.2 have gotten pretty damn good. They aren’t SotA, but they are much cheaper than their frontier equivalents. Right now, GLM 5.2 is ~$1.4 per million tokens and ~$4 per million output tokens. By contrast, the entire Opus 4.X series is $5 per million input and a whopping $25 per million output. The only model in the Anthropic suite that even comes close to GLM 5.2 in terms of pricing is Haiku 4.5, at $1 / million input and $5 / million output. But GLM 5.2 blows Haiku out of the water, and in some cases is even stronger than GPT 5.5 on benchmarks. If I were the big labs, I’d be pretty concerned about this. And if I were basically any consumer on the market, I would be doing everything I can to avoid provider lock in by adopting tools that are able to sit on top of all of the major players.
OpenAI unveiled an in house chip for inference.
On Wednesday, OpenAI unveiled its first custom-built inference processor, designed and manufactured in collaboration with Broadcom. Named Jalapeño, the new processor was designed specifically for the unique needs of OpenAI’s inference systems. OpenAI’s own AI models assisted in the development of the chip, the company said.
Jalapeño, like a jalapeño chip. Get it?

Agentics is the study of how to use and reason about agents. If you are an expert in coding agents, or interested in learning more about agents, join our community slack. More articles here. Learn more about how Nori can bring your company into the glorious AI future at norisessions.com.

Tokenmaxxing 已死，Tokenmaxxing 万岁。 Tokenmaxxing is dead, long live tokenmaxxing

Tokenmaxxing 已死，Tokenmaxxing 万岁。
Tokenmaxxing is dead, long live tokenmaxxing