| |||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||
![]() |
原始链接: https://news.ycombinator.com/item?id=43618178
Hacker News 上的一篇文章《深度学习,深度丑闻》引发热议,该文章指出大型语言模型(LLM)未能达到预期,并且公司正在使用一些可疑手段掩盖这一事实。 评论者们就文章的论点展开了辩论,一些人认为,在基于证明的任务(例如美国数学奥林匹克竞赛)中的表现揭示了模型的局限性,尽管它们在计算密集型任务上的表现很强。一些人对 AGI 持怀疑态度,他们更关注当前工具的实用性,而另一些人则担心过高的期望会导致对 LLM 技术的投资方向错误。 讨论还涉及到随着最初的炒作热潮消退,可能出现的“AI 寒冬”。一些评论者认为 LLM 是非常高效的编码工具,而另一些人则告诫不要盲目相信 AI 生成的代码。总体而言,评论倾向于对 LLM 的长期潜力越来越怀疑,一些人预测关注点将转向效率和本地模型托管。
| |||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||
![]() |
If you are a skilled mathematician, it is quite easy to verify both that (as of 7th April) models excel at novel calculations on held out problems and mostly shit the bed when asked for proofs.
Gary cites these USAMO as evidence of contamination influencing benchmark results, but that view is not consistent with strong performance of the models on clearly held out tasks (arc test, AIME 25, HMMT 25, etc etc).
If you really care, you can test this by inventing problems! It is a very very verifiable claim about the world.
In any case, this is not the pundit you want. There are many ways to make a bear case that are much saner than this.
reply