潜在智能体：一种内部化多智能体辩论的后训练流程

潜在智能体：一种内部化多智能体辩论的后训练流程
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

论文“Latent Agents”提出了一种新颖的训练后框架，旨在将多智能体辩论的推理优势提炼至单个大型语言模型（LLM）中。传统的各类多智能体辩论虽能增强推理能力，但因需要长篇对话记录，计算成本高昂。本研究采用两阶段微调流程——利用动态奖励调度和长度截断技术——实现了这些辩论过程的“内化”。所得模型在匹配或超越显式多智能体系统性能的同时，将 Token 使用量减少了高达 93%。通过激活转向（activation steering）技术，作者揭示了这一过程会产生“特定于智能体的子空间”，即代表不同内部视角且具有可解释性的激活方向。除了性能提升，这种内化过程还带来了显著的安全优势。通过将恶意行为映射到特定的内部智能体，作者证明了相较于标准基准模型，这种方法能更有效地通过负向转向来定位并抑制有害特征，且对通用能力的影响微乎其微。这项工作不仅为部署高级推理提供了更高效的途径，也为引导和控制模型内部行为提供了一种实用方法。

```Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交登录潜在智能体：一种用于内化多智能体辩论的后训练程序 (arxiv.org) PaulHoule 发布于 1 小时前 | 5 分 | 隐藏 | 过往 | 收藏 | 讨论帮助 | 准则 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索： ```

原文

[Submitted on 27 Apr 2026]

View a PDF of the paper titled Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate, by John Seon Keun Yi and 2 other authors

View PDF HTML (experimental)

Abstract:Multi-agent debate has been shown to improve reasoning in large language models (LLMs). However, it is compute-intensive, requiring generation of long transcripts before answering questions. To address this inefficiency, we develop a framework that distills multi-agent debate into a single LLM through a two-stage fine-tuning pipeline combining debate structure learning with internalization via dynamic reward scheduling and length clipping. Across multiple models and benchmarks, our internalized models match or exceed explicit multi-agent debate performance using up to 93% fewer tokens. We then investigate the mechanistic basis of this capability through activation steering, finding that internalization creates agent-specific subspaces: interpretable directions in activation space corresponding to different agent perspectives. We further demonstrate a practical application: by instilling malicious agents into the LLM through internalized debate, then applying negative steering to suppress them, we show that distillation makes harmful behaviors easier to localize and control with smaller reductions in general performance compared to steering base models. Our findings offer a new perspective for understanding multi-agent capabilities in distilled models and provide practical guidelines for controlling internalized reasoning behaviors. Code available at this https URL

From: John Seon Keun Yi [view email]
[v1] Mon, 27 Apr 2026 18:06:03 UTC (8,283 KB)

潜在智能体：一种内部化多智能体辩论的后训练流程 Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

潜在智能体：一种内部化多智能体辩论的后训练流程
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate