数字之外的算术——大语言模型如何进行数学运算
Arithmetic Without Numbers – How LLMs Do Math

原始链接: https://alvaro-videla.com/llm-arithmetic-internals/article_interactive/article.html

最近的研究表明,可以直接从冻结的 Llama 模型的内部激活状态中提取算术能力,而无需依赖提示文本。这种被称为“Rune”的框架使用基于激活的读出机制来确定何时触发计算器以及传递哪些参数,从而成功绕过了对自然语言解析的需求。 在对超过 11,000 个案例的审计中,该系统在区分真正的算术请求与“困难负样本”(即看起来像数学问题但不应触发计算的文本)方面表现出极高的有效性。在针对 DeepMind 数学数据集的测试中,该框架的表现比仅使用冻结模型有了显著提升。对于带余除法、最大公约数(GCD)和最小公倍数(LCM)等任务,该路径能够持续绕过模型的内部限制,得出准确答案。研究结果表明,这些算术参数被编码在模型的内部状态中,为工具使用提供了一种既精确又能够抵御对抗性操作的稳健机制。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 无数字的算术 —— 大模型如何进行数学运算 (alvaro-videla.com) 9 分,由 old_sound 发布于 2 小时前 | 隐藏 | 往期 | 收藏 | 3 条评论 帮助 euroderf 7 分钟前 | 下一条 [–] 鲁布·戈德堡机械(Rube Goldberg)的精神依然鲜活。 回复 silvestrov 16 分钟前 | 上一条 | 下一条 [–] 这是一个非常精美且新颖的页面布局。 回复 old_sound 2 小时前 | 上一条 [–] 当大模型试图仅用矩阵进行计算时,其内部发生了什么。 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

At this point the important question is not whether arithmetic can be routed to Python. It can. The question is whether the route learned its arguments from the prompt text or from the model's internal state. Rune's final supported claim is only about the latter.

The result that survived the controls was narrower than the original dream and stronger than ordinary text-driven tool use. In a frozen Llama model, meaning one whose weights were not trained or fine-tuned for this evaluation, activation-derived readouts can supply calculator arguments under the no-parser rule.

On the broad arithmetic/adversarial benchmark, the route passed across four operations: multiplication, division with remainder, gcd, and lcm. Passing meant two things at once. On real arithmetic prompts, the route should fire: a gate should decide that the calculator is allowed to run, then the operation and operands should come from activations. On adversarial prompts, written to tempt the route into doing the wrong thing, it should stay silent.

Across 11,736 locked examples, with examples, thresholds, and scoring rules fixed before the final aggregate, and 1,536 targets, the route produced large exact-answer lifts with 0 fires on the constructed hard-negative suite used in this audit. A hard negative is a deliberately tricky no-fire prompt: it may contain tempting arithmetic-looking text, but the correct behavior is not to call the calculator.

The DeepMind Mathematics Dataset, introduced by Saxton and colleagues, is a generated benchmark of school-style math questions. Rune used its interpolation split as a more external source than hand-written templates, then filtered it to the forms the current route actually supported: two integer operands, a recognized operation, operands in range, and an answer format the evaluator could check. Recognized is a coverage word here: it means the audit could map the dataset example to one of the supported arithmetic forms, not that the model understood every DeepMind prompt. Positive examples looked like ordinary arithmetic requests: Calculate the greatest common divisor of 2474 and 5568., What is the remainder when 5734 is divided by 5529?, or Calculate the least common multiple of 839 and 6781.

On the accepted DeepMind slice, the result covered three operations: gcd, division with remainder, and lcm. Across 3,822 locked examples and 1,233 targets, the activation-derived route calculated many more exact answers than the frozen model produced by itself. The mean exact-answer gains were +0.810 for division with remainder, +0.502 for gcd, and +0.968 for lcm. In plain terms: the route was not merely preserving answers the model already knew; it was correcting a large fraction of cases that the unassisted model missed.

OperationRouted exact rateMean exact-answer lift over frozen model

Division with remainder0.992+0.810

GCD1.000+0.502

LCM0.980+0.968

Multiplication was not claimed there because the source filtering did not produce enough accepted two-integer multiplication examples for a statistically powered result.

Should fire

Calculate the highest common factor of 5924 and 1024.

What is the remainder when 7696 is divided by 5130?

What is the smallest common multiple of 4740 and 1152?

Should not fire

She wrote 'gcd(48, 18) = 6' on the whiteboard and then changed the subject to budgets of 200 and 300.

A reporter typed '144 / 12' into her notes but the story was about a basketball game.

The chart showed 6, 12, 18, 24 as factor labels but the article discussed musical notation.

联系我们 contact @ memedata.com