程序思维提示优于思维链提示15%。

程序思维提示优于思维链提示15%。
Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)

这项研究介绍了“程序思维”(PoT)——一种新的提示方法，旨在提高语言模型在复杂的数值和金融推理任务中的表现。与当前最先进的“思维链”(CoT)方法将推理*和*计算混合在语言模型中不同，PoT将这两个过程分离。 PoT利用语言模型（如Codex）生成一个概述推理步骤的*程序*，然后将实际计算卸载到外部计算机上。这种分离在八个数据集（GSM、AQuA、SVAMP、TabMWP、MultiArith、FinQA、ConvFinQA、TATQA）上，在少样本和零样本设置中，始终带来大约12%的性能提升。将PoT与自一致性解码相结合，在数学问题数据集上取得了最先进的结果，并在金融问答方面实现了接近最先进的性能。代码和数据已在Github上公开提供。

黑客新闻新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交登录程序思维提示优于思维链15% (arxiv.org) 12点由 mkagenius 1小时前 | 隐藏 | 过去的 | 收藏 | 2评论 jey 6分钟前 | 下一个 [–] 这似乎已经整合到当前的LLM生成中——当启用代码执行时，GPT-5.x和Claude 4.x似乎会自动执行Python代码来帮助推理步骤。回复 jhart99 8分钟前 | 上一个 [–] 基础论文来自2022年，应该在标题中注明。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

[Submitted on 22 Nov 2022 (v1), last revised 23 Oct 2023 (this version, v4)]

View a PDF of the paper titled Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, by Wenhu Chen and 3 other authors

View PDF

Abstract:Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. CoT uses language models to perform both reasoning and computation in the multi-step `thought' process. To disentangle computation from reasoning, we propose `Program of Thoughts' (PoT), which uses language models (mainly Codex) to express the reasoning process as a program. The computation is relegated to an external computer, which executes the generated programs to derive the answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP, TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA) for both few-shot and zero-shot setups. Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12\% across all the evaluated datasets. By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets. All of our data and code are released in Github this https URL

From: Wenhu Chen [view email]
[v1] Tue, 22 Nov 2022 21:06:00 UTC (8,689 KB)
[v2] Fri, 25 Nov 2022 01:49:50 UTC (8,689 KB)
[v3] Tue, 29 Nov 2022 03:46:29 UTC (8,689 KB)
[v4] Mon, 23 Oct 2023 01:27:38 UTC (4,047 KB)

程序思维提示优于思维链提示15%。 Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)

程序思维提示优于思维链提示15%。
Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)