一个代理不足以应付。
One agent isn't enough

原始链接: https://benr.build/blog/one-agent-isnt-enough

## 平行代理:克服AI编码中的方差 代理编码存在固有的方差——LLM的随机性意味着相同的提示可能产生不同的结果。仅仅改进提示工程(上下文工程)可以降低*平均*质量,但不能保证找到*最佳*解决方案。本文提出**并行收敛**:同时运行多个代理实例,以探索更广泛的解决方案空间。 核心思想是将代理运行视为样本,分散风险并利用“集体智慧”。从干净的上下文开始的并行代理可以逃避单个代理可能陷入的次优局部最小值。当多个代理独立地提出相同的方法时,即为收敛——这验证了一个强大的解决方案。 此工作流程用于两个阶段:生成多个问题解决方案,以及收集关于问题的各种信息(使用专门的代理进行git历史记录、文档、网络研究等)。Claude Code的编排模式促进了这一点,综合代理输出并偏向于更简单、经过验证的解决方案。 虽然需要消耗大量token,但并行收敛提供了基于多个独立来源的高置信度结果。它对于复杂的调试或规划最有价值,对于简单的任务则价值较低。最终,上下文工程塑造了良好解决方案的*可能性*,而并行收敛*找到*了最佳解决方案。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 一个代理不足以 (benr.build) 14 分,由 bisonbear 10 小时前发布 | 隐藏 | 过去的 | 收藏 | 2 条评论 yawnxyz 9 小时前 [–] 如果你试图让多个代理产生一致的结果,是不是最好让他们最终构建一个可重复的工作流程?回复 bisonbear 9 小时前 | 父评论 [–] 好问题 - 然而我认为这些不一定是互斥的。 我拥有可重复的工作流程,可以利用多个代理的优势。可重复的工作流程为单个代理带来一致的结果。使用多个代理可以让你充分探索问题空间。 一个和谐使用这些概念的例子是创建一个自定义斜杠命令,该命令生成具有自定义提示的子代理,使其进行更多探索。命令 + 代理提示使流程可重复 + 可改进。 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Agentic coding has a problem - variance. What if single-agent runs are leaving performance on the table by design?

Due to the stochastic nature of LLMs, each agent run has slight variations. Even with the same context, one session with an agent might land near the "peak" of where we could expect it to (rolling a 20 on a d20), and another session might land somewhere in the middle of the narrowed probability curve (rolling a 10 instead).

In Part 1 I talked about my mental model around context engineering: the goal of context engineering is to shift the probability distribution of LLM responses, where the "probability distribution" is the space of all possible results from the LLM.

In this piece, I'll talk about extending that mental model to adjust for the fact that we can easily and (relatively) cheaply trigger parallel agent runs.

The goal of context engineering is more than just decreasing standard deviation and improving the mean quality of the probability distribution - it's also to ensure reliable convergence to a local maximum.

The Limitations of Context Engineering

Using context engineering best practices (prompt engineering, reference to relevant documentation, targeted addition of relevant tools, skills, etc.) to shift the probability distribution handles the first-order problem: reduce the likelihood of bad outcomes, and raise the floor - improving the average quality of responses.

But it doesn't solve the exploration problem. Even within the improved distribution, there are multiple paths the agent can take to solve the problem. Some are decent. Others are optimal. A few will make you want to slam your head into the keyboard. A single agent run picks one path. You don't know if it's the peak, or just high enough satisfy you.

The Second Mental Model

SOLUTION LANDSCAPE

Single Agent:                        Parallel Agents + Synthesis:

   ▲                                    ▲
   │      ╱╲                            │      ╱╲
   │     ╱  ╲    ╱╲                     │     ╱  ╲     ╱╲
   │    ╱    ╲  ╱  ╲                    │    ╱ ②  ╲  ╱   ╲
   │   ╱      ╲╱    ╲   ╱╲              │   ╱       ╲╱  ④ ╲   ╱╲
   │  ╱    ①         ╲╱   ╲            │  ╱   ①            ╲╱   ╲
   │ ╱                      ╲           │ ╱        ③    ⑤        ╲
   └────────────────────────────►       └────────────────────────────►

   You get what you get               Synthesizer picks ② as winner

The solution to the problem is parallel agents (and lots of tokens).

With parallel agents, we take a "sample" multiple times (i.e. multiple runs of the same / similar prompt), explore different peaks, and use the findings from the group to converge on the best solution. We're able to hedge our bets, and using the knowledge of the crowd, consistently get more insight out of the LLM.

Why does this work? I'm not making $100M at Meta as an AI researcher so I can't answer - but I'll do my best to speculate.

Multiple Samples This is the main one that I've been mentioning. Five agents = five independent samples. You're not relying on a single path and some luck to find the peak.

A single agent run might settle on a suboptimal solution - a local minimum that works but isn't great. It found something functional and stopped exploring. Parallel agents with independent starting points can escape these traps. They explore different regions of the problem space, pushing past mediocre solutions to find better ones. The convergence pattern reveals when multiple paths lead to the same superior approach.

Different Starting Points Clean context windows mean no anchoring bias. Each agent explores from a fresh perspective.

Validation Through Repetition When two agents independently suggest the same approach, that's evidence it's a local maximum. When all agents diverge, you need more constraints.

The parallel structure transforms coding agents from sampling from a single random draw into being a guided search for peaks.

How I Use Parallel Convergence

I use parallel convergence primarily in two ways / phases that fall into workflows of

  1. Generating multiple solutions to a problem
  2. Gathering information from multiple sources about a problem

Here's how it works:

Note that I primarily use Claude Code, which supports subagents via an orchestrator pattern, meaning that one main agent spawns subagents, and later synthesizes the results

PARALLEL CONVERGENCE WORKFLOW

         Phase 1                    Phase 2
         GATHER                     SOLVE

         A   B   C                  X   Y   Z
         │   │   │                  │   │   │
         └───┼───┘                  └───┼───┘
             ▼                          ▼
         synthesize ──── plan ───▶ synthesize ──▶ execute

Generate multiple solutions to a problem

In this workflow, I'll use multiple agents to come up solutions to the same problem. The goal here is to de-risk the fact that any one agent may come up with a sub-par solution.

When spinning up the subagents, Claude may assign them different angles to approach the problem at, allowing main Claude to explore more of the problem space.

For example, if I'm debugging why a modal renders behind everything despite z-index: 9999 (we've all been there), Claude might approach the problem from a data flow, React hooks, and component layering perspectives.

Claude then synthesizes, validates, and proposes a solution based on the outputs from all subagents. If 3/5 subagents came up with a similar solution, then it is more likely that this solution is what we want, and we should move forward with it.

I most commonly use this in debugging cases, but it's also been useful in the planning phase of a more complicated task.

Gather information about a problem

As part of my planning workflow in Claude Code, I dispatch multiple intelligence-gathering subagents. Here are some examples of the agents I'll use:

  • Agent A: scan git history (what patterns exist?)
  • Agent B: search local documentation (what's been tried before?)
  • Agent C: map code paths (what interfaces are available?)
  • Agent D: analyze test coverage (what validation already exists?)
  • Agent E: identify constraints (what are the boundaries?)
  • Agent F: find risks (what do we need to watch out for?)
  • Agent G: web research (what do online resources say?)

Yes, seven agents is excessive. It is! I won't unleash all seven (that's chaos) - but having the full menu available matters.

Each explores independently. Each has a chance to discover different information about the problem approaching it from a slightly different perspective. Different goal here: while in the previous one, we are dispatching agents to discover solutions to the same problem, in this case, we are dispatching agents to find distinct, but complementary information related to the problem.

With this information in hand (and context sufficiently primed), Claude can then proceed with making a plan for solving the problem. (If you want to double down on parallelism, you can also use parallel agents for planning).

What Convergence Looks Like

Here's an example from an "AI hedge fund" project I'm working on for model evaluation.

The problem: the AI could articulate detailed failure modes (good), but claimed Sharpe ratios that would make Renaissance Technologies jealous (bad). It had the form of institutional risk documentation without the calibration of realistic return expectations. I needed to update the prompts to address this case.

I launched 4 parallel intelligence-gathering agents:

  • Agent A (Intelligence Gatherer): Found similar past tasks related to commits adding calibration to edge scoring
  • Agent B (Extracting Patterns from Codebase): Found the same edge scoring calibration pattern in the codebase, noted it was "proven effective"
  • Agent C (Git Historian): Found the exact same commit history, described 3 "calibration improvement waves" over 7 weeks
  • Agent D (Web Researcher): Found Ken French Data Library and AQR research with actual factor premium numbers (momentum: 5-8% annually, quality: 2-4%)

All four agents, exploring from completely different angles (pattern database, codebase analysis, git history, web research), converged on the same solution: add calibration guidance using the existing Anti-Patterns section format, grounded in historical factor data.

Even better: Agent A initially suggested a 60-line dedicated section. But when Claude synthesized all the findings, the convergence pattern showed a simpler path - a 5-line addition to the existing Anti-Patterns section would achieve the same goal without context bloat.

How does Claude actually synthesize? Honestly, I don't control that directly it's part of Claude Code's orchestrator pattern. But I can see what happens: it weights agreement heavily, surfaces outliers worth considering, and, critically, tends toward simpler solutions when convergence supports it (with additional prompting pushing for simplicity). That's how a 60-line suggestion became 5.

The convergence told me two things:

  1. The solution was validated (4 independent explorations → similar conclusions)
  2. The minimal version was sufficient

The cost was ~10 minutes of parallel agent time, and maybe 200k tokens total. My Claude Code usage limits weep, but the payoff was a high-confidence solution with evidence from multiple independent sources, plus the discipline to keep it simple.

If this sounds excessive for a 5 line change, it is! That's kind of the point.

Even for a ~5 line prompt change, it was worth grounding those 5 lines in past decisions, web research, and agent consensus.

When NOT to Use This

The multi-agent approach doesn't come without its drawbacks:

  • Token use
  • Context bloating in main agent from the additional information
  • Time waiting for agents

A single agent is largely sufficient for well-defined tasks, simple changes, or easy bugs. In other cases, I start to consider using the parallel workflow.

From Random Walk to Guided Convergence

WHAT CONVERGENCE TELLS YOU

     Agents Agree                     Agents Diverge
     ─────────────                    ───────────────

       "caching"                        "caching"
       "caching"                        "rewrite DB"
       "caching"                        "add index"
           │                                │
           ▼                                ▼
        EXECUTE                     1. Tighten constraints
                                    2. Ask user for opinions on path forward

Part 1's model: better context engineering → better distribution → better average outcomes

In this model: better context → better distribution → parallel exploration → convergence validation → optimal outcomes reliably

To summarize: Context engineering creates the right distribution. Parallel convergence finds the peaks within it.

Next Steps

Next time you're debugging a tricky issue, spin up 3 parallel agents, and see if they're able to find something that 1 agent alone couldn't.

This probably sounds more systematic than it felt. In practice, it's a lot of "let's see what happens"! If you have learnings from a similar workflow, or thoughts on the article, reach out to me - I'd love to discuss what you're doing!

— Ben

Part 2 of a series on context engineering and building with AI coding agents. Part 1 introduced probability distributions and information architecture. In subsequent pieces, I'll go into specifics on my workflow, and what I've learned from building KOUCAI.

联系我们 contact @ memedata.com