面向代理系统规模化的科学:何时以及为何代理系统有效。
Towards a science of scaling agent systems: When and why agent systems work

原始链接: https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/

人工智能正从简单的问答模式发展到能够推理和持续交互的复杂“AI代理”,例如编码助手或健康教练。传统的AI准确性指标不足以评估这些代理,因为错误会在多个步骤中累积。 一种普遍的观点是,增加更多专业代理总是能提高性能,初步研究也证实了规模化带来的益处。然而,一项新研究“面向代理系统规模化的科学”对这一假设提出了挑战。 通过对180种代理配置进行广泛测试,研究人员发现,仅仅*增加*代理的数量并不能保证更好的结果。事实上,如果代理与任务没有仔细对齐,性能通常会停滞甚至*下降*。这项研究建立了代理系统规模化的首个定量原则,强调了超越“更多代理更好”的更细致方法的需求。

一篇最近的谷歌研究论文,探讨了代理系统规模化的科学原理,在Hacker News上引发了讨论。该论文试图理解代理系统*何时*以及*为何*成功,但评论员对广泛概括表示怀疑,因为涉及的变量过于复杂。 一位用户强调了使用中央协调器、规划委员会和并行任务执行的成功架构。然而,许多回复表达了对谷歌在AI*应用*方面的失望,特别是其产品中(如搜索和Gemini集成功能)的代理系统,尽管承认通过API直接访问时,底层模型(Gemini、AlphaGo、AlphaFold)功能强大。 普遍的观点倾向于将该论文视为“定量调查”,而非突破性进展,并建议对谷歌当前代理技术实现持健康的怀疑态度。
相关文章

原文

AI agents — systems capable of reasoning, planning, and acting — are becoming a common paradigm for real-world AI applications. From coding assistants to personal health coaches, the industry is shifting from single-shot question answering to sustained, multi-step interactions. While researchers have long utilized established metrics to optimize the accuracy of traditional machine learning models, agents introduce a new layer of complexity. Unlike isolated predictions, agents must navigate sustained, multi-step interactions where a single error can cascade throughout a workflow. This shift compels us to look beyond standard accuracy and ask: How do we actually design these systems for optimal performance?

Practitioners often rely on heuristics, such as the assumption that "more agents are better", believing that adding specialized agents will consistently improve results. For example, "More Agents Is All You Need" reported that LLM performance scales with agent count, while collaborative scaling research found that multi-agent collaboration "...often surpasses each individual through collective reasoning."

In our new paper, “Towards a Science of Scaling Agent Systems”, we challenge this assumption. Through a large-scale controlled evaluation of 180 agent configurations, we derive the first quantitative scaling principles for agent systems, revealing that the "more agents" approach often hits a ceiling, and can even degrade performance if not aligned with the specific properties of the task.

联系我们 contact @ memedata.com