代码行数拥有了更好的公关人员
Lines of Code Got a Better Publicist

原始链接: https://curlewis.co.nz/posts/lines-of-code-got-a-better-publicist/

软件行业衡量开发者成功的方式已发生转变,从关注可靠性、营收和客户价值等实际成果,转向依赖“AI虚荣指标”。诸如“AI生成代码比例”或“AI成熟度阶梯”等新基准,仅仅是追踪采用强度的产量指标,而非衡量业务影响。 尽管关于AI生产力的研究依然复杂且结论往往相互矛盾,但业界共识认为组织获得的收益相当有限(约10%)。然而,企业正越来越多地利用模糊的生产力叙事来证明大规模裁员的合理性,以武断的产量数据取代严谨的绩效评估。 这种趋势十分危险,因为这些指标会影响预算和人力规划。我们已经拥有经受过“实战考验”的工程健康状况追踪方式,例如DORA指标和有意义的业务增长。虽然采用AI工具对于保持竞争力至关重要,但企业必须抵制以肤浅的AI产出统计数据取代循证绩效评估的诱惑。归根结底,领导层应将AI视为提升价值的工具,而非取代既定问责制的理由。核心挑战依然在于:区分AI驱动的产出量与实际业务成果。

这篇 Hacker News 的讨论凸显了业界对于使用人工智能生成海量代码这一趋势日益增长的紧张情绪。 评论者对优先考虑代码行数(LoC)产出而非实际产品价值或可维护性的“AI原生工程”趋势表示怀疑。许多参与者认为,这种将“每位工程师百万行代码”作为生产力指标的追求,是古德哈特定律的体现,即一个指标(代码行数)一旦成为目标,就会变得误导,从而导致了所谓的“代码垃圾(code slop)”。 这场辩论涵盖了几个核心主题: * **实用性与数量:** 批评者认为,内部技术公告往往炫耀代码数量,却无法说明产品的目的或用户价值。 * **术语:** 有人建议,将“技术债务”改为“垃圾(slop)”可能在向管理层传达不受限制的 AI 生成代码的危险性时更为有效。 * **实用主义:** 虽然一些人捍卫 AI 驱动的代码重构作为一种合理的自动化工具,但许多工程师警告称,行业正在失去焦点,其目标应是构建盈利、可持续的软件,而非仅仅追求产出最大化。 总体而言,这种情绪反映了人们对 AI 炒作的日益疲惫,以及回归务实、以价值为导向的软件开发的渴望。
相关文章

原文

It’s fifteen years ago (bear with me, I’ve been in this industry since the late 90s, most of my good stories start this way), and you’ve got two senior developers at a SaaS company. One of them writes 40% more lines of code than the other. Is that developer better? More impactful for the business? Should the other one be polishing their CV?

Of course not. You’d want to know what actually shipped. What it did for customers, for revenue, for reliability. Lines of code, PR counts… we spent a couple of decades learning these are stereotypically bad ways to measure a developer, to the point where suggesting them today is laughable.

Sooooo… Here’s what the industry put on the billboard this year:

Every single one is a volume claim. “Percent of code written by AI” is just lines of code with a better publicist. (The sceptic in me editing this draft would like to point out that it’s no coincidence that all of these are AI vendors of some kind, so pumping adoption is pretty important to them.)

We used to claim outcomes

Rewind a few years and the headline number was different in kind, not just size. GitHub’s flagship claim was that developers completed tasks 55% faster with Copilot. Say what you like about that study (plenty did), but it was an outcome claim. Bold, falsifiable, about value. If it was wrong, you could show it was wrong.

The 2026 claims can’t fail. That’s the genius of them; “75% of our code is AI-written” could be true, and will keep going up, regardless of whether anything got better (faster delivery, fewer incidents, happier customers, etc). A volume number can only ever disappoint you if adoption stalls, and adoption is the one thing most of us agree is real. 📈

So the claims got bigger and started saying less. What happened in between?

The bit nobody puts on a billboard

The outcome evidence got complicated, that’s what happened.

The strongest pro-adoption result is still Cui et al. ; nearly 5,000 developers, +26% completed tasks, with the biggest gains for junior devs. Not really in dispute. But then GitClear showed code churn rising and refactoring collapsing as Copilot adoption deepened. Then METR ran the study many have quoted: experienced open-source devs were 19% slower with AI in their own codebases, while believing they were 20% faster.

But! Hold my beer… in February 2026 METR effectively walked it back : their follow-up estimates flipped to a speedup (with error bars wide enough to ride a Moto Guzzi, with panniers, through!), and they abandoned the study design entirely - because developers now refuse to work without AI, and can’t reliably self-report time on agentic work. Their latest position: AI probably speeds developers up in 2026, and we can no longer cleanly measure by how much.

Meanwhile at the company level, an NBER survey of ~6,000 executives found 69% of firms actively using AI and roughly nine in ten reporting no measurable productivity impact. The cross-study consensus sits somewhere around 10% organisational gains. Not nothing! Still bloody useful! Buuuut, also not “you don’t need developers anymore” territory.

And if you’re a sceptic still quoting “19% slower”, you’re cherry-picking too. The research keeps updating; the industry just changed what it counts.

Vanity metrics, now in AI flavour

It’s not just AI vendor claims, to be fair. Carnegie Mellon’s SEI and Accenture launched an AI Adoption Maturity Model just a few days ago: five levels, eight dimensions, marketed off a stat about 95% of organisations seeing no returns. Steve Yegge’s “8 levels of AI-assisted development” ranks you by which tools you run and how much supervision you give them. And every tools vendor now ships a maturity ladder whose top rung is, usually, “use more of our product”. These ladders measure adoption intensity and call it maturity. Same substitution, nicer packaging.

My favourite data point in this whole genre: Augment surveyed 219 engineering leaders and asked them to define “AI-native engineering” . They got 219 different answers. 🫠

Spider-Man pointing meme

And the prize for holding both ends of the rope goes to Anthropic, who gave us the “8x more code shipped” claim and one of the more rigorous studies of the year: an RCT finding that AI-assisted developers scored 17% lower on comprehension of the code they’d just shipped, with no statistically significant productivity gain. I use Claude every single day (it recommended half the links I read for this post, so the irony is not lost on me), the products are genuinely excellent, and their research arm updates while their marketing arm counts volume. Both things are true at once, which is kinda the point.

Why I actually care

Because these numbers aren’t decorative. They move budgets, performance expectations, and headcount plans. In February, Jack Dorsey cut over 40% of Block’s workforce (4,000+ people) with AI as the explicit core thesis: “A significantly smaller team, using the tools we’re building, can do more and do it better.” A couple weeks later, Atlassian cut 10% (~1,600 people) , while conceding it would be “disingenuous to pretend AI doesn’t change the mix of skills we need or the number of roles required”. And there’s a key detail that gets me: Dorsey said, in the same announcement, that the business was strong and gross profit was growing.

When a company says “AI made everyone more productive, so we need fewer people”, I want to see the evidence - and I don’t believe it exists today. Show me that x% of your workforce is genuinely idle (or even just underutilised) because the work can now be done by fewer people. Even then: I’ve never seen a product/SaaS company that didn’t have an endless roadmap. If you got a free headcount increase essentially overnight, why wouldn’t you use it to deliver more value to your customers, faster? That should show up as MAU, conversion, revenue. Choosing the layoff instead tells me the productivity claim is doing PR work for a decision that was already made for other reasons (over-hiring, investor pressure, take your pick).

Look, every business carries some fat, and I can accept efficiency-driven trimming as a thing that sometimes legitimately happens - it has at every step change in this industry. But when it happens, try to do so using the individual performance systems you already run, the ones that surface who’s cruising and who’s disengaged. Not token counts. Not “% of code AI-written” or somebody’s level on a maturity ladder. If your selection evidence is a vanity metric, your selection is a lottery wearing lipstick.

Where I land

As I’ve said in previous posts , don’t read any of this as anti-AI. I think every engineer should be using AI daily. Call it AI-first, AI-proficient, whatever you like. Be curious, try the new tools, test the latest models. To not do so is silly. I’ve watched this industry absorb higher-level languages, IDEs, autocomplete, agile and devops, and there were always crusty hold-outs reminiscing about the good old days before X came along and ruined everything. The hold-outs eventually got on board (usually). The difference this time is pace: you could delay adopting “the cloud” for a couple of years and survive. With AI you might get a few months. The way we work has already changed, and it’s not changing back as far as I can tell.

But adoption is the starting line, not the scoreboard. We already know how to measure whether engineering is delivering: DORA metrics, reliability, rate of meaningful change, and ultimately revenue and customer value. Battle-tested, crusty stuff. Why are we throwing all of that out for bullshit AI vanity scores? (I could be wrong about plenty in this post, but I don’t think I’m wrong about that one.)

So here’s the question to smuggle into your next vendor pitch, exec review, or LinkedIn doom-scroll: is that an outcome, or a volume? It’s amazing how quickly a position or statement deflates when you ask that.

The change is here to stay and the tools are good. The hopeful part is that we already know how to measure what matters (and none of it is counted in tokens).

Be AI-first in how you work, but battle-tested in how you measure it.

Cheers,
Dave

联系我们 contact @ memedata.com