70%人工智能生产力神话:为什么大多数公司没有看到收益
The 70% AI productivity myth: why most companies aren't seeing the gains

原始链接: https://sderosiaux.substack.com/p/the-70-ai-productivity-myth-why-most

## 人工智能生产力悖论 尽管供应商声称生产力可提高70-90%,但许多开发者——甚至OpenAI的Andrej Karpathy等行业领袖——感觉在使用人工智能工具后反而*更*落后。独立研究显示出明显的不符:只有一小部分人(约10%)体验到显著的益处,而大多数人几乎没有改善,甚至有些人经历*生产力下降*(一项METR研究显示,经验丰富的开发者速度下降了19%)。 收益集中在特定场景:人工智能原生初创公司、新项目、样板任务以及辅助早期职业开发者。然而,90%的开发者在处理遗留系统、集成复杂性和陡峭的学习曲线时,面临着“人工智能熟练度税”——需要大量技能提升,并经历最初的生产力下降。 问题不在于公然的虚假,而在于误导。期望被夸大,衡量标准存在缺陷;开发者常常*认为*自己更快,即使数据表明并非如此。现实的期望是生产力提高10-15%,投资回报期需要11-13个月。工程领导者应专注于有针对性的应用,如代码审查和文档编写,使用现实的代码库跟踪价值实现时间,并接受漫长而充满挑战的过渡期。

## AI 生产力:炒作与现实 最近 Hacker News 上出现了一场讨论,关于 AI 是否能为开发者带来承诺中的生产力提升。虽然 AI 可以快速根据请求生成代码,但许多评论员认为这些好处往往被夸大,并且伴随着隐藏的成本。 一个主要担忧是需要进行大量的代码审查——无论是细致的逐行检查,还是接受由于 AI 生成的代码编写质量差或无法扩展而可能产生的重大技术债务。 许多用户指出,未经审查的 AI 代码会迅速创建混乱、冗余的代码库,即使对于 AI 来说也很难维护。 普遍的观点倾向于 AI 最适合自动化更简单的任务,而复杂的项目需要对整个软件流程进行整体优化,而不仅仅是代码生成。 一些人报告使用高级模型获得了适度的生产力提升(15-20%),但另一些人由于早期版本输出的代码中存在大量错误,导致开发速度*下降*。 一个反复出现的主题是,AI 工具需要大量的学习和熟练的应用——一种“流畅性税”,并且如果过度依赖,甚至可能导致基本编程技能的下降。 最终,讨论表明,实现 AI 的潜力需要仔细集成、强大的测试,以及关注整体开发实践,而不仅仅是期望快速提高生产力。
相关文章

原文

"I've never felt this much behind as a programmer."

That's Andrej Karpathy, OpenAI co-founder and one of the most respected AI researchers alive, writing in December 2025. He describes a "magnitude 9 earthquake" rocking the profession. A "powerful alien tool" handed around with no manual.

Now consider the narrative you've been hearing from vendors, executives, and LinkedIn thought leaders: AI has collapsed software development costs by 70-90%. Development velocity is through the roof. If you're not seeing these gains, you're doing it wrong.

These two realities don't fit together. If even Karpathy feels behind, what hope does the average enterprise engineering team have?

The answer is uncomfortable: the 70-90% productivity claim is true for about 10% of the industry. For the other 90%, it's a marketing hallucination masquerading as data.

Let's start with what vendors promise.

GitHub claims Copilot makes developers 55% faster. Google reports similar figures. Microsoft suggests 20-30% improvements. OpenAI's enterprise report touts that users save 40-60 minutes per day.

Now let's look at independent research.

A randomized controlled study by METR (Model Evaluation & Threat Research) found something that should terrify every CTO: experienced developers using AI tools took 19% longer to complete tasks than those working without them.

Not beginners. Not interns fumbling with ChatGPT. Experienced engineers. On codebases they knew. With tools designed to make them faster.

They got slower.

The Stack Overflow 2025 Developer Survey adds nuance. While 52% of developers report some positive productivity impact from AI tools, only a minority experience transformative gains. 46% now actively distrust AI output accuracy, up from 31% last year. The number-one frustration, cited by 66% of developers: AI solutions that are "almost right, but not quite", leading to time-consuming debugging.

Perhaps most telling is the perception gap.

In the METR study, developers predicted AI would make them 24% faster before starting. After finishing 19% slower, they still believed they'd been 20% faster.

Read that again. They got measurably slower but remained convinced they'd sped up.

This isn't just a productivity problem. It's a measurement problem. If teams can't tell they're slower, how many companies are bleeding productivity while celebrating their AI transformation? How many engineering leaders are making headcount decisions based on gains that don't exist?

The disconnect between perception and reality explains why the hype persists. Every developer who feels faster reinforces the narrative, regardless of what the stopwatch says.

Are the productivity claims lies? No. They're something worse: true in a lab, false in production.

When a claim only works for 10% of teams but gets marketed as universal, that's not context-dependence. That's misdirection.

The gains are real for:

  • AI-native startups. No legacy systems. No accumulated tech debt. No workforce that needs retraining. When your entire stack was designed post-2024, AI tools slot in naturally. One CTO at a high-growth SaaS company told Menlo Ventures that 90% of their code is now AI-generated via Cursor and Claude Code, up from 10-15% twelve months prior with GitHub Copilot.

  • Greenfield projects. Starting fresh on a modern stack with clear requirements? AI accelerates scaffolding, boilerplate, and initial implementation dramatically. There's no context to load, no legacy patterns to respect.

  • Boilerplate-heavy tasks. CRUD operations, API wrappers, test scaffolding, documentation. These are AI's sweet spot: repetitive, well-documented patterns with low novelty.

  • Early-career developers. 56% use AI daily, higher than any other cohort. For them, AI is a learning accelerator. It helps them navigate unfamiliar codebases and learn patterns faster.

The common thread: low complexity, high repetition, minimal context.

Now let's talk about the other 90% of the industry.

The legacy infrastructure wall. Industry research widely cites that organizations spend up to 80% of their IT budgets maintaining outdated systems. Over 70% of digital transformation initiatives stall due to legacy infrastructure bottlenecks. AI tools trained on modern frameworks don't know what to do with your 2008 Struts application or your COBOL batch jobs.

They hallucinate solutions that look plausible until they hit production. They suggest refactors that would take six months of human work to validate. And when leadership asks why you're not seeing the 70% gains, you're stuck explaining that the vendor demos didn't include 15-year-old Java monoliths held together with duct tape and prayers.

  • The AI fluency tax. This isn't free to learn. BairesDev's Q3 2025 survey found developers spend nearly 4 hours per week on AI-related upskilling. Microsoft's research shows it takes 11 weeks for developers to fully realize productivity gains from AI coding tools, with teams often experiencing an initial productivity dip during ramp-up. That's nearly three months of learning before you break even, and that assumes the tools don't change underneath you.

  • Integration complexity. Legacy systems often can't communicate with modern AI tooling. This creates silos and friction that AI can't bridge.

  • Human factors. Resistance to change is real. Trust deficits are real. When 46% of your engineering team doesn't trust the output, adoption stalls.

Karpathy's quote is worth reading in full because it captures something the productivity studies miss: AI development isn't just "coding plus autocomplete". It's an entirely new paradigm.

Here's his list of what developers now need to master:

"agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations"

And the kicker:

"a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering."

This isn't learning a new library or framework. This is learning to work with something that is:

  • Stochastic: it gives different outputs for the same inputs

  • Fallible: it makes mistakes with high confidence

  • Unintelligible: you can't debug why it did what it did

  • Changing: the tools update constantly

For experienced developers, this may actually be harder. They have decades of muscle memory around deterministic systems. They've internalized debugging strategies that don't apply when the "bug" is an LLM hallucination with no stack trace.

The data supports this. Only 48% of developers use AI agents or advanced tooling. A majority (52%) either don't use agents at all or stick to simpler AI tools. 38% have no plans to adopt them.

So what should engineering leaders actually expect?

  • On productivity gains: Bain's Technology Report 2025 found that teams using AI assistants report perhaps 10-15% productivity improvement. Not 70%. Not even 30%. This aligns with McKinsey's broader finding of 5-20% cost savings across operations.

  • On ROI timelines: Expect 11-13 months before seeing meaningful organizational returns. Individual developers may see gains sooner (after the 11-week ramp-up), but system-wide ROI requires training, integration, and process change.

On where to focus:

  • Boilerplate generation and scaffolding

  • Code review acceleration

  • Documentation and test generation

  • Onboarding acceleration for new team members

On where NOT to expect miracles:

  • Complex legacy refactoring

  • Architectural decisions requiring deep context

  • Novel problem-solving with high stakes

  • Systems with poor documentation or unusual patterns

Stop measuring adoption and start measuring outcomes. A framework:

  1. Run your own pilots with realistic codebases. Not toy examples. Not greenfield projects. Your actual legacy systems with your actual team.

  2. Track time-to-value, not acceptance rates. How many AI suggestions get accepted means nothing if the resulting code needs extensive debugging.

  3. Budget for the ramp-up period. Build the 11-week productivity dip into your ROI models. If you expect immediate gains, you'll abandon ship before the value arrives.

  4. Separate "AI-assisted" from "AI-generated". Autocomplete suggestions are different from full function generation. Track them separately.

  5. Create internal benchmarks based on your stack. Vendor demos use modern frameworks with clean architectures. Yours probably doesn't. Measure against your reality, not theirs.

The AI productivity revolution is real, but it's not evenly distributed. It's happening at AI-native startups, on greenfield projects, and for developers who've invested the ramp-up time. For the rest of the industry, we're in a long, expensive transition.

Stop comparing your enterprise team to Twitter demos of developers spinning up React apps from scratch. That's a different planet from your reality of legacy systems, complex domains, and large codebases.

Measure what actually matters: cycle time, defect rates, developer satisfaction. Not lines of code generated or suggestions accepted.

Accept that the learning curve is real and long. Your senior engineers aren't failing if they haven't mastered the new stack in a quarter. Karpathy himself calls it a "magnitude 9 earthquake". Give people time.

And stop pretending the earthquake isn't happening. The 70-90% claims may be overblown for most organizations today. But the trajectory is clear. AI tooling is getting better. The abstraction layer is maturing. The manual is being written, slowly.

But the next time a vendor tells you AI will 10x your team, ask them one question: "Show me the enterprise teams running legacy systems who got those gains."

If they can't, walk away.

Karpathy's closing advice: "Roll up your sleeves to not fall behind."

That's the only honest path forward. Not hype. Not denial. Just work.

联系我们 contact @ memedata.com