编码助手正在解决错误的问题。
Coding assistants are solving the wrong problem

原始链接: https://www.bicameral-ai.com/blog/introducing-bicameral

## AI 在软件开发中:利弊参半 最近的数据对 AI 编码助手在生产中的即时效益提出了质疑。虽然使用 AI 的团队完成的任务多了 21%,但整体交付指标并未改善。令人惊讶的是,有经验的开发者在使用 AI 辅助时反而*更慢*,尽管他们*感觉*更快。一个关键问题是 AI 难以处理歧义:它将需求差距隐藏在代码中,增加了审查时间和安全漏洞(48% 的 AI 生成代码包含缺陷)。 核心问题不在于代码生成,而在于现有的开发流程。开发者只有 16% 的时间用于编码;其余时间都花在澄清、审查和运营任务上。目前,AI 实际上*增加了*这些效率低下,常常抵消了编码节省的时间。 然而,经验丰富的工程师表现出潜力,利用 AI 快速生成大型代码库。关键似乎在于将开发者的重点从编码*转移到*产品工程——架构和高级设计。 最终,成功的 AI 集成取决于在编码开始*之前*减少歧义,在产品讨论中揭示关键的工程背景(例如状态机和数据流差距),以及增强而非取代人工监督。重点应该放在利用 AI 澄清需求和理解现有系统,而不是仅仅自动化代码创建。

``` Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 编码助手正在解决错误的问题 (bicameral-ai.com) 20 分,由 jinhkuan 1小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 monero-xmr 0分钟前 [–] 首先你必须接受工程优雅 != 市场价值。只有某些应用和商业模式需要最顶尖的工程师。 LLM 正在掏空工程领域的中间和下层。但并未侵蚀最高端。否则所有 LLM 公司不会为人才付费,他们会直接使用自己的 LLM。 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索: ```
相关文章

原文

The jury is out on the effectiveness of AI use in production, and it is not a pretty picture.

  • Teams using AI completed 21% more tasks, yet company-wide delivery metrics showed no improvement (Index.dev, 2025)
  • Experienced developers were 19% slower when using AI coding assistants—yet believed they were faster (METR, 2025)
  • 48% of AI-generated code contains security vulnerabilities (Apiiro, 2024)

To understand why, we have to take a closer look at the day-to-day software development. Consider this point raised in a colorful exchange on r/ExperiencedDev:

A developers’ job is to reduce ambiguity. We take the business need and outline its logic precisely so a machine can execute. The act of writing the code is the easy part. Odds are, you aren’t creating perfect code specs into tickets, even with meeting notes, because developers will encounter edge cases that demand clarification over the course of implementation…

There are two key points raised in this comment. Firstly, coding assistants require clearly-defined requirements in order to perform well. Secondly, edge cases and product gaps are often discovered over the course of implementation.

These two facts come head-to-head in the application of coding agents to complex codebases. Unlike their human counterparts who would and escalate a requirements gap to product when necessary, coding assistants are notorious for burying those requirement gaps within hundreds of lines of code, leading to breaking changes and unmaintainable code.

As a result, more overhead is spent on downstream code reviews (Index.dev, 2025) and fire-patching security vulnerabilities (Apiiro, 2025).

In other words, the use of AI in production settings often increases ambiguity and reduces code reliability, directly contradicting the objective of developers.

The picture is not without optimism. Some experienced engineers report transformative results: one principal engineer at Google claimed AI “generated what we built last year in an hour”; Boris Cherny, creator of Claude Code, described a month where he “didn’t open an IDE at all” while the model “wrote around 200 PRs, every single line.” The optimistic case is that developers evolve from coders into product engineers, focusing on architecture and product thinking while AI handles implementation.

This however reflects the experience of seasoned developers who have both the technical depth to review AI output critically and the autonomy within their organizations to straddle product and engineering.

For much of the software engineering workforce, the junior and mid-level engineers at banks, healthcare, and government agencies, there’s much less wiggle room. They are sandwiched between the unreliability of AI output and the increased expectation from management to ship faster, resulting in a rapidly widening empathy gap between developers and product owners.

The widening empathy gap between developers and product owners

The product context often goes through multiple layers (end users -> marketers -> product managers) before landing on their lap, necessitated by the separation of responsibilities within an organization and the unique demands of their industries. The effective use of coding agents may require a level of team coordination that simply does not justify the gains in technical output.

But what if we have simply been approaching the problem from the wrong angle? Suppose we tackle the pain points of software development from first principles, can we come up with solutions that organically decrease ambiguity and reliably increase engineering velocity in production?

Consider how developers spend their time (IDC, 2024):

How developers spend their time in 2024

Only 16% of a developer’s time goes to writing code. The rest? Security and code reviews, monitoring, deployments, requirements clarification—operational work that keeps the lights on but doesn’t ship features.

Here’s the irony: AI coding assistants save developers roughly 10 hours per week, but the increase in inefficiencies in the other parts of the development lifecycle almost entirely cancelled out such gains (Atlassian, 2025). Here’s a comment from the earlier cited Redditor.

They produce legitimate-looking code, and if no one has had the experience of thinking through the assumptions and then writing them into code - considering the edge cases- it’ll be lgtm’d and shipped. You’re shifting the burden of this feedback cycle to the right, after the code is output, and that makes us worse off since code is tougher to read than write.

There’s a name for misalignment between business intent and codebase implementation: technical debt. The use of coding agents without careful delineation of their scope and responsibilities is threatening to accelerate tech debt accumulation.

Hammering AI code generation on existing codebases doesn’t solve the problem, because contrary to what the label “tech debt” may suggest, most tech debt isn’t actually created in the code, it’s created in product meetings. Deadlines. Scope cuts. “Ship now, optimize later.” Those decisions shape the system, but the reasoning rarely makes it into the code.

[Engineers] occasionally have access to complete data; at other times, they must work with limited information. They might be conscious of uncertainties surrounding their evidence, but frequently they are not. Competing social, financial, and strategic priorities influence the tradeoffs in unexpected ways. — Rios et al., 2024

How can we make this context-sharing and decision-making process less chaotic? We surveyed developers across different roles and team sizes regarding their product-engineering handoff process. The results were overwhelming: the majority discover unexpected codebase constraints weekly, after already committing to a product direction and the corresponding architectural implementation. When asked what would help most, two themes dominated:

  1. Reducing ambiguity upstream so engineers aren’t blocked waiting on product clarification mid-implementation
  2. A clearer picture of affected services and edge cases to allow for more precise feature scoping and time allocation

When asked which engineering context would be most valuable to surface during product discussions, three categories stood out: state machine gaps (unhandled states caused by user interaction sequences), data flow gaps, and downstream service impacts.

Identifying how feature updates affect existing architectures and data flow is rated most desirable among engineering contexts to be surfaced after a product meeting.

This aligns with decades of SDLC research showing that the costliest defects stem from misalignment between requirements and architecture, and such gaps often go unnoticed until it is too late.

Luckily, the advancement of coding LLMs works in our favor here. Whereas generating fully-functional code through natural language prompting is prone to errors due to the aforementioned context problem, the reverse process, mapping out existing code structures and inferring how they may be impacted by a specific requirement, is much more tenable with recent models.

From this vantage point, the possibilities to improving the developmental lifecycle is endless. Some suggested real-time display of engineering context during a meeting to help steer discussions; Others requested a code review bot that detects the discrepencacy of code implementation with stated product/business requirements.

All-in-all, developers are eager to try out new tools that augment the existing way of doing things, provided they retain flexibility over when such tools are deployed. There is also little reservation against having longer but more fruitful product meetings: it is the difficulty conveying blockers that is the source of frustration.

At Bicameral, we are committed to taking this pragmatic approach to alleviating software development pains, and move beyond lab benchmarks to investigate the most effective way to deploy AI in the wild.

Our thesis is that LLMs could be a huge boon both for the industry and for individual developers—channeling the unrivaled human capacity to operate under uncertainty and adapt—provided the technology is developed with human needs in mind.

If you’re a developer, we want to learn which types of context hurt most when they’re missing from discussion, based on your unique experience.

Survey link: https://form.typeform.com/to/w4rPXoPD


  1. Index.dev. (2025). AI Coding Assistant ROI: Real Productivity Data.
  2. METR. (2025). Measuring AI’s Ability to Complete Long Tasks.
  3. Apiiro. (2024). 4x Velocity, 10x Vulnerabilities: AI Coding Assistants Are Shipping More Risks.
  4. IDC. (2024). How Do Software Developers Spend Their Time?
  5. Atlassian. (2025). State of Developer Experience Report.
  6. Rios, N., et al. (2024). Technical Debt: A Systematic Literature Review.
  7. Pragmatic Engineer. (2025). When AI Writes Almost All Code.
联系我们 contact @ memedata.com