(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43618178

Hacker News 上的一篇文章《深度学习,深度丑闻》引发热议,该文章指出大型语言模型(LLM)未能达到预期,并且公司正在使用一些可疑手段掩盖这一事实。 评论者们就文章的论点展开了辩论,一些人认为,在基于证明的任务(例如美国数学奥林匹克竞赛)中的表现揭示了模型的局限性,尽管它们在计算密集型任务上的表现很强。一些人对 AGI 持怀疑态度,他们更关注当前工具的实用性,而另一些人则担心过高的期望会导致对 LLM 技术的投资方向错误。 讨论还涉及到随着最初的炒作热潮消退,可能出现的“AI 寒冬”。一些评论者认为 LLM 是非常高效的编码工具,而另一些人则告诫不要盲目相信 AI 生成的代码。总体而言,评论倾向于对 LLM 的长期潜力越来越怀疑,一些人预测关注点将转向效率和本地模型托管。

相关文章
  • 深度学习,深度丑闻 2025-04-08
  • (评论) 2025-04-06
  • (评论) 2025-03-27
  • (评论) 2025-03-08
  • (评论) 2024-08-07

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    [flagged] Deep Learning, Deep Scandal (garymarcus.substack.com)
    30 points by imichael 44 minutes ago | hide | past | favorite | 11 comments










    The linked USAMO math results are in an exam that requires proofs. The same authors, on the same website, ran AIME 2025 shortly after it happened and found it was totally consistent with the o1 announcement numbers; the difference being that the AIME requires only short answers and no proof.

    If you are a skilled mathematician, it is quite easy to verify both that (as of 7th April) models excel at novel calculations on held out problems and mostly shit the bed when asked for proofs.

    Gary cites these USAMO as evidence of contamination influencing benchmark results, but that view is not consistent with strong performance of the models on clearly held out tasks (arc test, AIME 25, HMMT 25, etc etc).

    If you really care, you can test this by inventing problems! It is a very very verifiable claim about the world.

    In any case, this is not the pundit you want. There are many ways to make a bear case that are much saner than this.



    Does any of this matter if you're a person that thinks "AGI" is a silly concept, and just uses these tools for what they're good at currently?

    I'm not trying to be snarky, I'm just wondering why I would care that a tech giant has failed to cross the "GPT-5" threshold. What's the significance of that to an ordinary user?



    Yes, the quality of the models is increasing at a slower rate and the race will transition to performance and efficiency.

    This is good for self hosters and devs who will be able to run near SOTA models like QwQ locally. I’m near the point where I’m going to cancel my ChatGPT Plus and Claude subscription.

    If you’re not already trying to self host, build your own local agents and build your own MCPs/Tools I would encourage you to try it. If you don’t have a fancy GPU or M1+ try out QwQ on Groq or Flash 2.0 Lite with the Gemini API, it’s super cheap and fast and they are basically equivalent (if not better) than the ChatGPT you were paying for 16 months ago.



    If people think that LLMs will get very good at tons of things, they will invest quite a bit of time figuring out how to work with them (even if they are not that great at useful things right now). If those people then learn that LLMs will never get very good at so many things, they will then tend to invest less time in studying up on how to best use them.

    I know I write off some of the time I spend working with LLMs as an investment in the future. If someone told me this is as good as they'll get, I would definitely invest less time working with them.



    If you're interested in it as a tool, you can skip this stuff.

    This is more for those curious about if AI is tulip bulbs.

    There's unintentional ideological camps on AI, one is mad about a financial/interest bubble*, and it is maintained by content like this every 4 to 8 weeks. (randomly selected representative comment, in this same discussion: https://news.ycombinator.com/item?id=43618256)

    * reminiscent, to me, of how I felt about Uber for years and years and years until I sort of moved on when it survived COVID.



    One of the comments in the article says: "I don't see how it's not a net negative tech," to which Marcus replies: "That’s my current tentative conclusion, yes."

    What is the negative effect I'm not seeing? Bad code? Economic waste in datacenter investment? Wasted effort of researchers who could be solving other problems?

    I've been writing software for over a decade, and I’ve never been as productive as I am now. Jumping into a new codebase - even in unfamiliar areas like a React frontend - is so much easier. I’m routinely contributing to frontend projects, which I never did before.

    There is some discipline required to avoid the temptation to just push AI-generated code, but otherwise, it works like magic.



    Really just a matter of when the bubble pops now, isn't it? There's just too much substantial evidence pointing to the fact that AI is simply not going to be the product the big players say it will be.


    Yeah, no shit. Winter is coming again for AI:

    https://en.wikipedia.org/wiki/AI_winter

    That said, the techniques that have come out of machine-learning research in recent years are indeed powerful, for certain, constrained purposes. That was true of other AI technologies in the past, but we don't call them AI anymore; we call them things like "rules engines". These days, you could take a curren-year press release about AI and use a cloud-to-butt style filter to replace all occurrences of "AI" with "statistics" and see virtually zero drop in factuality (thought a considerable drop in market sizzle).



    The technology is just a couple years old, and this article is derived from a couple months of evidence.

    We can't yet say what the future holds. The nay Sayers who were so confident that LLMs were stochastic parrots are now embarrassingly wrong. This article sounds like that. Whether we are actually at a dead end or not is unknown. Why are people talking with such utter conviction when nobody truly understands what's going internally with LLMs?



    I know it, it was a scary period for programmers. The tide is turning. Meatsuits are back in the game.


    > The reality, reported or otherwise, is that large language models are no longer living up to expectations, and its purveyors appear to be making dodgy choices to keep that fact from becoming obvious.

    What else is new?







    Join us for AI Startup School this June 16-17 in San Francisco!


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com