程序思维提示优于思维链提示15%。
Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)

原始链接: https://arxiv.org/abs/2211.12588

这项研究介绍了“程序思维”(PoT)——一种新的提示方法,旨在提高语言模型在复杂的数值和金融推理任务中的表现。与当前最先进的“思维链”(CoT)方法将推理*和*计算混合在语言模型中不同,PoT将这两个过程分离。 PoT利用语言模型(如Codex)生成一个概述推理步骤的*程序*,然后将实际计算卸载到外部计算机上。这种分离在八个数据集(GSM、AQuA、SVAMP、TabMWP、MultiArith、FinQA、ConvFinQA、TATQA)上,在少样本和零样本设置中,始终带来大约12%的性能提升。 将PoT与自一致性解码相结合,在数学问题数据集上取得了最先进的结果,并在金融问答方面实现了接近最先进的性能。代码和数据已在Github上公开提供。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 程序思维提示优于思维链15% (arxiv.org) 12点 由 mkagenius 1小时前 | 隐藏 | 过去的 | 收藏 | 2评论 jey 6分钟前 | 下一个 [–] 这似乎已经整合到当前的LLM生成中——当启用代码执行时,GPT-5.x和Claude 4.x似乎会自动执行Python代码来帮助推理步骤。回复 jhart99 8分钟前 | 上一个 [–] 基础论文来自2022年,应该在标题中注明。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

View a PDF of the paper titled Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, by Wenhu Chen and 3 other authors

View PDF
Abstract:Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. CoT uses language models to perform both reasoning and computation in the multi-step `thought' process. To disentangle computation from reasoning, we propose `Program of Thoughts' (PoT), which uses language models (mainly Codex) to express the reasoning process as a program. The computation is relegated to an external computer, which executes the generated programs to derive the answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP, TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA) for both few-shot and zero-shot setups. Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12\% across all the evaluated datasets. By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets. All of our data and code are released in Github this https URL
From: Wenhu Chen [view email]
[v1] Tue, 22 Nov 2022 21:06:00 UTC (8,689 KB)
[v2] Fri, 25 Nov 2022 01:49:50 UTC (8,689 KB)
[v3] Tue, 29 Nov 2022 03:46:29 UTC (8,689 KB)
[v4] Mon, 23 Oct 2023 01:27:38 UTC (4,047 KB)
联系我们 contact @ memedata.com