你的工作是交付你已经验证过的代码。

你的工作是交付你已经验证过的代码。
Your job is to deliver code you have proven to work

原始链接: https://simonwillison.net/2025/Dec/18/code-proven-to-work/

## AI辅助开发：责任仍然在工程师 AI编码工具（如LLM）的兴起并未减轻开发者的核心责任：交付*可运行*的代码，而不仅仅是AI*生成*的代码。令人担忧的趋势是，初级工程师提交大量未经测试的代码变更，期望代码审查来发现错误——这种做法被认为是不礼貌且玩忽职守。如今，真正的软件工程重点在于验证。这需要**两个关键步骤**：**手动测试**（亲自演示功能，最好有记录的步骤/输出，甚至屏幕录像）和**自动化测试**（创建测试来*证明*变更有效并防止回归）。LLM实际上*促进*了自动化测试，因此跳过测试是不可原谅的。至关重要的是，即使有编码代理，人类仍然需要承担责任。工程师必须学会引导AI来*证明*其变更，模仿手动/自动化测试过程。组织良好的现有测试有助于代理学习并保持一致性。最终，有价值的贡献是那些有可证明功能证据支持的贡献——将重点从代码生成转移到代码*验证*。

## 黑客新闻讨论总结：证明代码有效 Simon Willison 最近的一篇黑客新闻帖子引发了关于人工智能辅助编码时代软件工程师角色变化的讨论。核心论点是，仅仅*生成*代码（现在通过LLM很容易实现）不再有价值。**重要的是*证明*代码有效，并提供可证明的证据。** 评论者普遍同意，强调了彻底的PR的重要性，包括对更改的清晰解释、测试说明（手动*和*自动化）以及视觉证据（截图/视频）。许多人指出一个令人担忧的趋势：初级工程师提交大型、未经测试的PR，期望代码审查来发现错误。讨论还涉及责任问题。虽然LLM可以生成代码，但它们无法为其正确性负责。验证和确认的责任仍然在于工程师。一些人提出了未来的系统，其中人工智能代理可以通过经济处罚或自动化验证系统来承担责任。最终，共识是强大的工程技能——理解问题领域、编写清晰的代码以及严格测试——比以往任何时候都*更*重要，即使*有*强大的AI工具。价值从实现转移到验证，并确保代码“属于”现有的代码库。

原文

18th December 2025

In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

This is rude, a waste of other people’s time, and is honestly a dereliction of duty as a software developer.

Your job is to deliver code you have proven to work.

As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

How to prove it works

There are two steps to proving a piece of code works. Neither is optional.

The first is manual testing. If you haven’t seen the code do the right thing yourself, that code doesn’t work. If it does turn out to work, that’s honestly just pure chance.

Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.

If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here’s a recent example.

Some changes are harder to demonstrate. It’s still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.

Once you’ve tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.

The second step in proving a change works is automated testing. This is so much easier now that we have LLM tooling, which means there’s no excuse at all for skipping this step.

Your contribution should bundle the change with an automated test that proves the change works. That test should fail if you revert the implementation.

The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.

Don’t be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I’ve done this myself I’ve quickly regretted it.

Make your coding agent prove it first

The most important trend in LLMs in 2025 has been the explosive growth of coding agents—tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.

To master these tools you need to learn how to get them to prove their changes work as well.

This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.

Since they’re robots, automated tests and manual tests are effectively the same thing.

They do feel a little different though. When I’m working on CLI tools I’ll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like Click’s CLIRunner.

When working on CSS changes I’ll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.

The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They’ll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.

Developing good taste in testing code is another of those skills that differentiates a senior engineer.

The human provides the accountability

A computer can never be held accountable. That’s your job as the human in the loop.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.

Next time you submit a PR, make sure you’ve included your evidence that it works as it should.