Claude 代码的二进制文件揭示了核心功能上的静默 A/B 测试。

Claude 代码的二进制文件揭示了核心功能上的静默 A/B 测试。
Please Do Not A/B Test My Workflow

原始链接: https://backnotprop.com/blog/do-not-ab-test-my-workflow/

一位Claude Code Pro用户（月费200美元）对Anthropic未公开的A/B测试感到沮丧，该测试显著降低了工具的“计划模式”功能——其工作流程的核心功能。在没有警告或选择加入的情况下，该用户被分配到一个高度限制的测试版本（“cap”），限制了计划长度并删除了上下文和散文解释等关键要素，导致体验变得不那么互动和有用。用户通过逆向工程应用程序发现了这一点，并强调缺乏透明度是一个主要问题。虽然承认A/B测试对于优化是必要的，但他们认为在没有用户意识或控制的情况下影响核心功能是不可接受的，特别是对于专业工具。他们倡导人工智能工具的可配置性和透明度，强调用户需要理解和引导人工智能过程，而不是不知不觉地受到破坏性实验的影响。这篇帖子目前在Hacker News上很受欢迎，呼吁负责任地部署人工智能，并赋予用户“掌控自己的流程”。

## Claude 代码 A/B 测试与用户担忧一篇最近的 Hacker News 帖子详细描述了对 Anthropic 的 Claude 代码的担忧，特别是关于核心功能上的“静默”A/B 测试。作者发现该平台似乎在悄悄地改变功能——可能降低某些用户的性能——而没有通知。讨论的中心在于这种测试的伦理问题，一些人将其与 Meta 的做法相比较。虽然 A/B 测试本身并非固有地负面，但将工具的有效性降低作为测试的一部分被认为不可接受。许多评论者强调了 LLM 普遍的不可靠性，质疑它们由于缺乏可复制的结果而是否适合专业用途。担忧还延伸到 Anthropic 在配额限制和产品变更方面缺乏透明度，如其服务条款中所允许的修改。几位用户表达了对 AI 工具“租赁”性质的沮丧，以及潜在的不可预测行为，这与人们对专业软件的可靠性期望形成对比。这场辩论触及了负责任的 AI 开发以及在潜在被盗数据上训练所带来的伦理影响等更广泛的问题。

原文

I’m a big fan of Claude Code. It’s completely changed how I work, and I’ve been a fan of Anthropic since day one. The founders’ research is something I genuinely admire. Experiencing my own workflow degrade over the past week was frustrating, and this post was written in that frustration. I’ve since revised it to be more accurate and fair in tone. It’s currently #1 on Hacker News, otherwise I’d probably just delete it.

Anthropic is running A/B tests on Claude Code that actively degrade my workflow. I wish I could opt out.

I don’t think A/B testing is inherently wrong. I don’t think Anthropic is doing this to intentionally degrade anyone’s experience. They’re clearly trying to optimize. But the test design matters, and vastly reducing the effectiveness of a core feature like plan mode is not acceptable test design.

I pay $200/month for Claude Code. It’s a professional tool I use to do my job, and I need transparency into how it works and the ability to configure it. What I don’t need is critical functions of the application changing without notice, or being signed up for disruptive testing without my approval. We need to be responsible with how we steer these tools (AI), and we need to be enabled to do so. Transparency is a critical part of that. Configurability is a critical part of that.

Every day, engineers complain about regressions in Claude Code. Half the time, the answer is: you’re probably in an A/B test and don’t know it.

I dug into the Bun package to try to understand what was different. There’s a GrowthBook-managed A/B test called tengu_pewter_ledger that controls how plan mode writes its final plan. Four variants: null, trim, cut, cap. Each one progressively more restrictive than the last.

A/B test variant groups found in the binary

The default variant gives you a full context section, prose explanation, and a detailed verification section. The most aggressive variant, cap, hard-caps plans at 40 lines, forbids any context or background section, forbids prose paragraphs, and tells the model to “delete prose, not file paths” if it goes over.

The cap variant instructions

I got assigned cap. There was no question/answer phase. I entered plan mode, and it immediately launched a sub-agent, generated its own plan with zero discourse, and presented me a wall of terse bullet points. No back and forth. No steering. Just a fait accompli. Here’s what a plan looks like under cap:

Example plan output under the cap variant

There was no opt-in. No notification. No toggle. No way to know this was happening unless you decompiled the binary yourself.

At plan exit, the variant gets logged with telemetry:

d("tengu_plan_exit", {
  planLengthChars: R.length,
  outcome: Q,
  clearContext: !0,
  planStructureVariant: h, // ← your variant ("cap")
});

The code shows they collect data like plan length, plan approval or denial, and variant assignment. What metrics they’re using downstream isn’t clear from the binary alone. What is clear is that paying users are the experiment.

This is the opposite of transparency and responsible AI deployment. AI tooling needs more transparency, not less. I need the ability to own my process and guide AI with a human in the loop.

Claude 代码的二进制文件揭示了核心功能上的静默 A/B 测试。 Please Do Not A/B Test My Workflow

Claude 代码的二进制文件揭示了核心功能上的静默 A/B 测试。
Please Do Not A/B Test My Workflow