搜索-R1：利用强化学习训练大型语言模型进行推理和利用搜索引擎

搜索-R1：利用强化学习训练大型语言模型进行推理和利用搜索引擎
Search-R1: Training LLMs to Reason and Leverage Search Engines with RL

Jin Bowen等人提出的Search-R1是一种新颖的方法，用于训练大型语言模型（LLM）有效地利用搜索引擎进行推理和知识获取。与在推理过程中引导LLM进行搜索的提示方法不同，Search-R1采用强化学习（RL）来训练LLM自主生成搜索查询，并在逐步推理过程中与实时检索进行交互。 Search-R1的核心在于使用多轮搜索交互优化LLM的展开过程，利用检索标记掩码等技术进行稳定的RL训练，并采用简单的基于结果的奖励系统。这种方法允许LLM学习最佳的搜索策略。在七个问答数据集上的实验表明，与强大的基线相比，性能得到了显著提高：Qwen2.5-7B提高了26%，Qwen2.5-3B提高了21%，LLaMA3.2-3B提高了10%。该论文还提供了关于RL优化、LLM选择和检索增强推理中响应长度动态的经验见解。代码和模型检查点已公开发布。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录搜索-R1：使用强化学习训练大型语言模型进行推理和利用搜索引擎 (arxiv.org) jonbaer 1小时前 5 分 | 隐藏 | 过去 | 收藏 | 讨论加入我们，参加6月16日至17日在旧金山举办的AI创业学校！指导原则 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

[Submitted on 12 Mar 2025 (v1), last revised 19 Mar 2025 (this version, v2)]

View a PDF of the paper titled Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, by Bowen Jin and 5 other authors

View PDF HTML (experimental)

Abstract:Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning capabilities during inference to use search engines is not optimal, since the LLM does not learn how to optimally interact with the search engine. This paper introduces Search-R1, an extension of the DeepSeek-R1 model where the LLM learns -- solely through reinforcement learning (RL) -- to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM rollouts with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21% (Qwen2.5-3B), and 10% (LLaMA3.2-3B) over strong baselines. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at this https URL.

From: Bowen Jin [view email]
[v1] Wed, 12 Mar 2025 16:26:39 UTC (196 KB)
[v2] Wed, 19 Mar 2025 21:40:12 UTC (196 KB)

搜索-R1：利用强化学习训练大型语言模型进行推理和利用搜索引擎 Search-R1: Training LLMs to Reason and Leverage Search Engines with RL

搜索-R1：利用强化学习训练大型语言模型进行推理和利用搜索引擎
Search-R1: Training LLMs to Reason and Leverage Search Engines with RL