拥抱并行编码代理的生活方式

拥抱并行编码代理的生活方式
Embracing the parallel coding agent lifestyle

原始链接: https://simonwillison.net/2025/Oct/5/parallel-coding-agents/

## 拥抱并行编码代理工作流 (2025年10月) 越来越多的工程师开始同时使用多个编码代理——例如Claude Code和Codex——以提高生产力。最初对此持怀疑态度，因为存在审查瓶颈，但作者已经开始采用这种“并行代理生活方式”来处理不会压倒主要工作任务。有效的使用场景包括**研究和概念验证**（快速测试库集成或理解现有代码库）、**小型维护任务**（例如修复弃用警告）以及**精心定义的任务**，其中代理接收详细的指令。后者显著减少了审查时间，因为目标和方法已经确定。目前使用Claude Code、Codex CLI/Cloud、Copilot Coding Agent和Google Jules，作者在终端窗口中运行代理，通常利用新的git检出以实现隔离。对于风险较高的任务，更喜欢异步代理。这个领域正在快速发展，鼓励分享工作流。推荐资源包括Jesse Vincent和Josh Bleecher Snyder的文章，重点介绍了发送“侦察”代理来识别问题区域，然后再进行完整实施等技术。关键在于实验——从成功和失败中学习对于优化这种新方法至关重要。

## 拥抱并行编码代理生活方式 - 摘要这次Hacker News讨论的核心是同时使用多个AI编码代理的新兴实践。虽然这有望提高生产力，但关键瓶颈被确定为**人工审查**。用户发现，虽然代理可以快速生成代码，但彻底验证的需求抵消了一些收益，需要开发者角色向更“超级管理者”的功能转变。多位用户分享了管理这些代理的经验，强调了清晰指令和结构化工作流程的重要性。一些人提倡使用工具来更好地管理来自多个代理的差异，而另一些人则强调需要强大的测试和谨慎的方法，以避免发布“垃圾代码”。一个反复出现的主题是将代理管理与管理人类开发者进行比较——建立信任和理解个体模式至关重要，但代理的输出不可预测，缺乏这一点。讨论还涉及潜在的倦怠以及组织增加的认知负荷的需要。最终，共识倾向于一个未来，AI将增强而非取代开发者，需要一种新的技能，专注于监督和集成。

原文

5th October 2025

For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or git worktrees.

I was pretty skeptical about this at first. AI-generated code needs to be reviewed, which means the natural bottleneck on all of this is how fast I can review the results. It’s tough keeping up with just a single LLM given how fast they can churn things out, where’s the benefit from running more than one at a time if it just leaves me further behind?

Despite my misgivings, over the past few weeks I’ve noticed myself quietly starting to embrace the parallel coding agent lifestyle.

I can only focus on reviewing and landing one significant change at a time, but I’m finding an increasing number of tasks that can still be fired off in parallel without adding too much cognitive overhead to my primary work.

Here are some patterns I’ve found for applying parallel agents effectively.

Research for proof of concepts

The first category of tasks I’ve been applying this pattern to is research.

Research tasks answer questions or provide recommendations without making modifications to a project that you plan to keep.

A lot of software projects start with a proof of concept. Can Yjs be used to implement a simple collaborative note writing tool with a Python backend? The libraries exist, but do they work when you wire them together?

Today’s coding agents can build a proof of concept with new libraries and resolve those kinds of basic questions. Libraries too new to be in the training data? Doesn’t matter: tell them to checkout the repos for those new dependencies and read the code to figure out how to use them.

How does that work again?

If you need a reminder about how a portion of your existing system works, modern “reasoning” LLMs can provide a detailed, actionable answer in just a minute or two.

It doesn’t matter how large your codebase is: coding agents are extremely effective with tools like grep and can follow codepaths through dozens of different files if they need to.

Ask them to make notes on where your signed cookies are set and read, or how your application uses subprocesses and threads, or which aspects of your JSON API aren’t yet covered by your documentation.

These LLM-generated explanations are worth stashing away somewhere, because they can make excellent context to paste into further prompts in the future.

Small maintenance tasks

Now we’re moving on to code edits that we intend to keep, albeit with very low-stakes. It turns out there are a lot of problems that really just require a little bit of extra cognitive overhead which can be outsourced to a bot.

Warnings are a great example. Is your test suite spitting out a warning that something you are using is deprecated? Chuck that at a bot—tell it to run the test suite and figure out how to fix the warning. No need to take a break from what you’re doing to resolve minor irritations like that.

There is a definite knack to spotting opportunities like this. As always, the best way to develop that instinct is to try things—any small maintenance task is something that’s worth trying with a coding agent. You can learn from both their successes and their failures.

Carefully specified and directed actual work

Reviewing code that lands on your desk out of nowhere is a lot of work. First you have to derive the goals of the new implementation: what’s it trying to achieve? Is this something the project needs? Is the approach taken the best for this current project, given other future planned changes? A lot of big questions before you can even start digging into the details of the code.

Code that started from your own specification is a lot less effort to review. If you already decided what to solve, picked the approach and worked out a detailed specification for the work itself, confirming it was built to your needs can take a lot less time.

I described my more authoritarian approach to prompting models for code back in March. If I tell them exactly how to build something the work needed to review the resulting changes is a whole lot less taxing.

How I’m using these tools today

My daily drivers are currently Claude Code (on Sonnet 4.5), Codex CLI (on GPT-5-Codex), and Codex Cloud (for asynchronous tasks, frequently launched from my phone.)

I’m also dabbling with GitHub Copilot Coding Agent (the agent baked into the GitHub.com web interface in various places) and Google Jules, Google’s currently-free alternative to Codex Cloud.

I’m still settling into patterns that work for me. I imagine I’ll be iterating on my processes for a long time to come, especially as the landscape of coding agents continues to evolve.

I frequently have multiple terminal windows open running different coding agents in different directories. These are currently a mixture of Claude Code and Codex CLI, running in YOLO mode (no approvals) for tasks where I’m confident malicious instructions can’t sneak into the context.

(I need to start habitually running my local agents in Docker containers to further limit the blast radius if something goes wrong.)

I haven’t adopted git worktrees yet: if I want to run two agents in isolation against the same repo I do a fresh checkout, often into /tmp.

For riskier tasks I’m currently using asynchronous coding agents—usually Codex Cloud—so if anything goes wrong the worst that can happen is my source code getting leaked (since I allow it to have network access while running). Most of what I work on is open source anyway so that’s not a big concern for me.

I occasionally use GitHub Codespaces to run VS Code’s agent mode, which is surprisingly effective and runs directly in my browser. This is particularly great for workshops and demos since it works for anyone with GitHub account, no extra API key necessary.

This category of coding agent software is still really new, and the models have only really got good enough to drive them effectively in the past few months—Claude 4 and GPT-5 in particular.

I plan to write more as I figure out the ways of using them that are most effective. I encourage other practitioners to do the same!