人工智能正在迫使我们编写好的代码。

人工智能正在迫使我们编写好的代码。
AI is forcing us to write good code

原始链接: https://bits.logic.inc/p/ai-is-forcing-us-to-write-good-code

## 为代理编码者构建：优先考虑“良好代码” 随着人工智能代理成为编码伙伴，传统的“可选”软件工程最佳实践现在变得*至关重要*。代理缺乏人类从错误中轻松恢复的能力，需要强大的安全保障。该团队的经验突出了成功集成代理的关键投资。 **核心原则：** 他们强制执行**100%代码覆盖率**——不是为了防止错误，而是为了保证代理验证的行为在*每一行*代码上。这消除了歧义，并提供了一个明确的测试“待办事项”列表。他们还优先考虑**清晰的代码组织**，使用描述性的文件名和小型、范围明确的文件来帮助代理加载上下文。 **自动化环境：** 快速、自动化的开发环境至关重要。一个命令启动隔离的工作树，并配置必要的设置，从而实现快速迭代。这需要快速测试（一分钟内带有缓存的 10,000+ 断言）以及通过代码检查器/格式化程序自动执行标准。 **强类型和数据完整性：** 使用 TypeScript 和强大的数据库类型（带有触发器的 Postgres）可最大限度地减少错误，并提供自文档代码。语义命名约定可以增强代理的理解。最终，投资这些“良好代码”实践不是一种负担，而是解锁代理编码全部潜力的基础步骤。

## AI 与代码质量：摘要这次 Hacker News 的讨论围绕着一个观点，即使用 AI，特别是大型语言模型 (LLM)，正在*迫使*开发者编写更好的代码。核心论点并非 AI 编写完美的代码，而是为了从 AI 获得有用的结果，开发者需要优先考虑清晰性、彻底的测试（目标是 100% 覆盖率）和正式规范。许多评论者指出，LLM 擅长*实现*明确定义的任务，但难以处理歧义或糟糕的代码基础。这促使开发者转向更严格的编码实践，例如在编码开始*之前*进行全面的测试套件和详细的规范。一些人甚至使用形式化验证工具，如 TLA+，为 AI 创建精确的规范。关于 100% 测试覆盖率的可行性和是否总是受益，存在一些争论，但共识倾向于认为，AI 开发施加的约束——需要清晰、可测试的代码——最终对软件质量是积极的。讨论还涉及 AI 能够加速开发的可能性，但告诫不要盲目依赖它，并强调持续的人工监督和架构思考的重要性。

原文

For decades, we’ve all known what “good code” looks like. Thorough tests. Clear documentation. Small, well-scoped modules. Static typing. Dev environments you can spin up without a minor religious ritual.

These things were always optional, and time pressure usually meant optional got cut.

Agents need these optional things though. They aren’t great at making a mess and cleaning it up later. Agents will happily be the Roomba that rolls over dog poop and drags it all over your house.

The only guardrails are the ones you set and enforce. If the agentic context is lacking and the guardrails aren’t sufficient, you’ll find yourself in a world of pain. But if the guardrails are solid, the LLM can bounce around tirelessly until the only path out is the correct one.

Our six-person team has made a lot of specific and, sometimes, controversial investments to accommodate our agentic coders. Let’s talk about some of the less obvious ones.

The most controversial guideline we have is our most valuable: We require 100% code coverage.

Everyone is skeptical when they hear this until they live with it for a day. It feels like a secret weapon at times.

Coverage, as we use it, isn’t strictly about bug prevention; it’s about guaranteeing the agent has double-checked the behavior of every line of code it wrote.

The usual misinterpretation is that people think we believe 100% coverage means “no bugs”. Or that we’re chasing a metric, and metrics get gamed. Neither of those are the case here.

Why 100%? At 95% coverage, you’re still making decisions about what’s “important enough” to test. At 99.99%, you don’t know if that uncovered line in ./src/foo.ts was there before you started work on the new feature. At 100%, there’s a phase change and all of that ambiguity goes away. If a line isn’t covered, it’s because of something you actively just did.

The coverage report becomes a simple todo list of tests you still need to write. It’s also one less degree of freedom we have to give to the agent to reason about.

At 100% coverage, the leverage you get from the tests experiences a step-function increase.

When a model adds or changes code, we force it to demonstrate how that line behaves. It can’t stop at “this seems right.” It has to back it up with an executable example.

Other nice benefits: Unreachable code gets deleted. Edge cases are made explicit. And code reviews become easier because you see concrete examples of how every aspect of the system is expected to behave or change.

The main mechanism agentic tools use to navigate your codebase is the filesystem. They list directories, read filenames, search for strings, and pull files into context.

You should treat your directory structure and file naming with the same thoughtfulness you’d treat any other interface.

A file called ./billing/invoices/compute.ts communicates much more than ./utils/helpers.ts, even if the code inside is identical. Help the LLM out and give your files thoughtful organization.

Additionally, prefer many small well-scoped files.

It improves how context gets loaded. Agents often summarize or truncate large files when they pull them into their working set. Small files reduce that risk. If a file is short enough to be loaded in full, the model can keep the entire thing active in context.

In practice, it will speed up the agent’s flow and eliminate a whole class of degraded performance.

In the old world, you lived in one dev environment. This is where you’d craft your perfect solution, tweak things, run commands, restart servers, and gradually converge on a solution.

With agents, you do something closer to beekeeping, orchestrating across processes without knowing the specifics of what exactly is happening within each of them. So you need to cultivate a good and healthy hive.

You need your automated guardrails to run quickly, because you need to run them often.

The goal is to keep the agent on a short leash: make a small change, check it, fix it, repeat.

You can run them a few ways: agent hooks, git hooks, or just prompting (i.e. in your AGENTS.md), but no matter how you run them, your quality checks need to be cheap enough that running them constantly is not slowing things down.

In our setup, every npm test creates a brand new database, runs migrations, and executes the full suite.

This only works for us because we’ve made each of those stages exceptionally fast. We run tests with high concurrency, strong isolation, and a caching layer for third-party calls. We have 10,000+ assertions that finish in about a minute. Without caching, it takes 20-30 minutes, which would add hours if you expected an agent to run tests several times per task.

Once you get comfortable with agents, you naturally start running many of them. You’ll spin up and tear down many dev environments multiple times a day. That has to all be fully automated or you’ll avoid doing it.

We have a simple workflow here:

new-feature <name>

That command creates a new git worktree, copies in local config that doesn’t live in git (like .env files), installs dependencies, and then starts your agent with a prompt to interview you to write a PRD together. If the feature name is descriptive enough, it may even just ask to get right to work, assuming it can figure out the rest of the context on its own.

The important part isn’t our specific scripts. It’s the latency. If it takes minutes and involves a bunch of tinkering and manual configuration, you won’t do it. If it is one command and takes 1-2 seconds, you’ll do it constantly.

In our case, one command gives you a fresh, working environment almost immediately, with an agent ready to start.

The final piece is being able to run each environment at the same time. Having a bunch of worktrees doesn’t help if you can only have one of them active at a time.

That means anything that could conflict (e.g. ports, database names, caches, background jobs) needs to be configurable (ideally via environment variables) or otherwise allocated in some conflict-free way.

If you use Docker you get some of this for free, but the general requirement is the same: you need a solid isolation story so you can run several fully functioning dev environments on one machine without cross-talk.

More broadly, automate the enforcement of as many best practices as you can. Remove degrees of freedom from the LLM. If you’re not already using automatic linters and formatters, start there. Make those as strict as possible and configured to automatically apply fixes whenever the LLM finishes a task or is about to commit.

But you should also be using a typed language.

Entire categories of illegal states and transitions can be eliminated. And types shrink the search space of possible actions the model can take, while doubling as source-of-truth documentation describing exactly what kind of data flows through each layer.

We lean on TypeScript pretty heavily. If something can be reasonably represented cleanly in the type system, we do it.

And we push semantic meaning into the type names. The goal is to make “what is this?” and “where does it go?” answerable at a glance.

When you’re working with agents, good semantic names are an amplifier. If the model sees a type like UserId, WorkspaceSlug, or SignedWebhookPayload, it can immediately understand what kind of thing it is dealing with. It can also search for that thing easily.

Generic names like T are fine when you’re writing a small self-contained generic algorithm, but much less helpful when you’re communicating intent inside a real business system.

On the API side, we use OpenAPI and generate well-typed clients, so the frontend and backend agree on shapes.

On the data side, we use Postgres’ type system as best as we can, and add checks and triggers for invariants that don’t fit into simple column types. Postgres doesn’t have a particularly rich type system, but it has enough there to enforce a surprising amount of correctness. If an agent tries to write invalid data, our database will usually complain clearly and loudly. And we use Kysely to generate well-typed TypeScript clients for us.

All of our other 3rd-party clients either give us good types, or we wrap them to give us good types.

Agents are tireless and often brilliant coders, but they’re only as effective as the environment you place them in. Once you realize this, “good code” stops feeling superfluous and starts feeling essential.

Yes, the upfront work feels like a tax, but it’s the same tax we’ve all been dodging for years. So pay it intentionally. Put it on your agentic roadmap, get it funded by eng leadership, and finally ship the codebase you always hoped for.

人工智能正在迫使我们编写好的代码。 AI is forcing us to write good code

人工智能正在迫使我们编写好的代码。
AI is forcing us to write good code