LLM 目前对数学研究完全无用:哈金斯
LLMs Are Currently Not Helpful at All for Math Research: Hamkins

原始链接: https://officechai.com/ai/llms-are-currently-not-helpful-at-all-for-math-research-give-garbage-answers-mathematician-joel-david-hamkins/

尽管最近有关于人工智能解决数学问题(包括埃尔德斯问题集)的说法,但数学界仍然持怀疑态度。数学家乔尔·戴维·汉金斯在最近的一次播客节目中分享了他使用大型语言模型的失望经历,发现它们“完全无用”,并且持续产生错误的答案。 他的主要担忧不仅仅是准确性问题,而是人工智能*自信*地给出错误答案,并且拒绝承认错误——他认为这种特性令人沮丧且适得其反。这与数学研究的协作性质形成对比,后者依赖于信任和建设性的批评。 其他数学家,如陶哲轩,也表达了同样的担忧,指出人工智能可以生成看似完美的证明,但其中隐藏着细微的缺陷。虽然承认未来可能有所改进,但汉金斯仍然持怀疑态度,强调人工智能在基准测试中的表现与作为数学家可靠研究伙伴的实际效用之间存在显著差距。

## LLM 与数学研究:一种怀疑的观点 最近 Hacker News 上出现了一场关于大型语言模型 (LLM) 在数学研究中效用的讨论。尽管人工智能领域存在大量炒作和投资,但一些数学家,如 Hamkins,认为 LLM “基本上毫无帮助”,主要是因为它们倾向于提供不正确的答案和令人沮丧的交互模式。 这种怀疑与来自 Terence Tao 等其他人的经验形成对比,他们发现 LLM 有用,但通常是用于验证现有证明,而不是发现新的证明。人们对潜在的偏见影响积极评估表示担忧,尤其是在存在资金投入的情况下。 许多评论员强调将 LLM 与形式化证明系统结合使用的重要性,以至少确保正确性,但验证证明的 *相关性* 仍然是一个挑战。另一些人指出,LLM 擅长重现已知解决方案,但在复杂的、新颖的问题上却难以应对。一个反复出现的主题是对严格测试的需求,超越表面使用,并对既不过度炒作也不完全否定该技术的潜力保持警惕。
相关文章

原文

There are some experts who say that they’ve used AI to solve Erdos problems and help with mathematics research, but not everyone is yet on board.

Joel David Hamkins, a prominent mathematician and professor of logic at the University of Notre Dame, recently shared his unvarnished assessment of large language models in mathematical research during an appearance on the Lex Fridman podcast. His experience stands in sharp contrast to the optimistic narratives surrounding AI’s potential in scientific discovery, and his critique centers on a fundamental issue: mathematical correctness.

“I guess I would draw a distinction between what we have currently and what might come in future years,” Hamkins began, acknowledging the possibility of future progress. “I’ve played around with it and I’ve tried experimenting, but I haven’t found it helpful at all. Basically zero. It’s not helpful to me. And I’ve used various systems and so on, the paid models and so on.”

His experience with current AI systems has been consistently disappointing. “My typical experience is interacting with AI on a mathematical question is that it gives me garbage answers that are not mathematically correct, and so I find that not helpful and also frustrating,” he explained. The frustration, for Hamkins, goes beyond mere incorrectness—it’s the nature of the interaction itself that proves problematic.

“The frustrating thing is when you have to argue about whether or not the argument that they gave you is right. And you point out exactly the error,” Hamkins said, describing exchanges where he identifies specific flaws in the AI’s reasoning. The AI’s response? “Oh, it’s totally fine.” This pattern of confident incorrectness followed by dismissal of legitimate criticism mirrors a type of human interaction that Hamkins finds untenable: “If I were having such an experience with a person, I would simply refuse to talk to that person again.”

Despite these issues, Hamkins recognizes that current limitations may not be permanent. “One has to overlook these kind of flaws and so I tend to be a kind of skeptic about the value of the current AI systems. As far as mathematical reasoning is concerned, it seems not reliable.”

Hamkins’ assessment highlights a critical tension in the AI community. While some researchers have reported breakthroughs—such as claims of AI assistance in tackling problems from the Erdos collection of mathematical challenges— some working mathematicians like Hamkins are finding current systems fundamentally unreliable for serious research. Mathematician Terrance Tao has said that AI can generate mathematical proofs that look flawless, but make subtle mistakes that humans wouldn’t. The issue isn’t just that LLMs make mistakes, but that they make them with confidence and resist correction, breaking the collaborative trust essential to mathematical discourse. As AI companies continue to invest heavily in reasoning capabilities and mathematical problem-solving, Hamkins’ experience serves as a sobering reminder that impressive benchmarks don’t always translate to practical utility for domain experts. The gap between AI performance on standardized tests and its ability to serve as a genuine research partner to some mathematicians remains wide, at least for now.

联系我们 contact @ memedata.com