Deep Learning, Deep Scandal

nmca · 2025-04-08T04:13:01 1744085581

The linked USAMO math results are in an exam that requires proofs. The same authors, on the same website, ran AIME 2025 shortly after it happened and found it was totally consistent with the o1 announcement numbers; the difference being that the AIME requires only short answers and no proof.

If you are a skilled mathematician, it is quite easy to verify both that (as of 7th April) models excel at novel calculations on held out problems and mostly shit the bed when asked for proofs.

Gary cites these USAMO as evidence of contamination influencing benchmark results, but that view is not consistent with strong performance of the models on clearly held out tasks (arc test, AIME 25, HMMT 25, etc etc).

If you really care, you can test this by inventing problems! It is a very very verifiable claim about the world.

In any case, this is not the pundit you want. There are many ways to make a bear case that are much saner than this.

tptacek · 2025-04-08T04:08:24 1744085304

Does any of this matter if you're a person that thinks "AGI" is a silly concept, and just uses these tools for what they're good at currently?

I'm not trying to be snarky, I'm just wondering why I would care that a tech giant has failed to cross the "GPT-5" threshold. What's the significance of that to an ordinary user?

faizshah · 2025-04-08T04:17:37 1744085857

Yes, the quality of the models is increasing at a slower rate and the race will transition to performance and efficiency.

This is good for self hosters and devs who will be able to run near SOTA models like QwQ locally. I’m near the point where I’m going to cancel my ChatGPT Plus and Claude subscription.

If you’re not already trying to self host, build your own local agents and build your own MCPs/Tools I would encourage you to try it. If you don’t have a fancy GPU or M1+ try out QwQ on Groq or Flash 2.0 Lite with the Gemini API, it’s super cheap and fast and they are basically equivalent (if not better) than the ChatGPT you were paying for 16 months ago.

apparent · 2025-04-08T04:14:28 1744085668

If people think that LLMs will get very good at tons of things, they will invest quite a bit of time figuring out how to work with them (even if they are not that great at useful things right now). If those people then learn that LLMs will never get very good at so many things, they will then tend to invest less time in studying up on how to best use them.

I know I write off some of the time I spend working with LLMs as an investment in the future. If someone told me this is as good as they'll get, I would definitely invest less time working with them.

refulgentis · 2025-04-08T04:16:55 1744085815

If you're interested in it as a tool, you can skip this stuff.

This is more for those curious about if AI is tulip bulbs.

There's unintentional ideological camps on AI, one is mad about a financial/interest bubble*, and it is maintained by content like this every 4 to 8 weeks. (randomly selected representative comment, in this same discussion: https://news.ycombinator.com/item?id=43618256)

* reminiscent, to me, of how I felt about Uber for years and years and years until I sort of moved on when it survived COVID.

mdonaj · 2025-04-08T04:21:40 1744086100

One of the comments in the article says: "I don't see how it's not a net negative tech," to which Marcus replies: "That’s my current tentative conclusion, yes."

What is the negative effect I'm not seeing? Bad code? Economic waste in datacenter investment? Wasted effort of researchers who could be solving other problems?

I've been writing software for over a decade, and I’ve never been as productive as I am now. Jumping into a new codebase - even in unfamiliar areas like a React frontend - is so much easier. I’m routinely contributing to frontend projects, which I never did before.

There is some discipline required to avoid the temptation to just push AI-generated code, but otherwise, it works like magic.

jmweast · 2025-04-08T04:05:43 1744085143

Really just a matter of when the bubble pops now, isn't it? There's just too much substantial evidence pointing to the fact that AI is simply not going to be the product the big players say it will be.

bitwize · 2025-04-08T04:16:58 1744085818

Yeah, no shit. Winter is coming again for AI:

https://en.wikipedia.org/wiki/AI_winter

That said, the techniques that have come out of machine-learning research in recent years are indeed powerful, for certain, constrained purposes. That was true of other AI technologies in the past, but we don't call them AI anymore; we call them things like "rules engines". These days, you could take a curren-year press release about AI and use a cloud-to-butt style filter to replace all occurrences of "AI" with "statistics" and see virtually zero drop in factuality (thought a considerable drop in market sizzle).

ninetyninenine · 2025-04-08T04:16:15 1744085775

The technology is just a couple years old, and this article is derived from a couple months of evidence.

We can't yet say what the future holds. The nay Sayers who were so confident that LLMs were stochastic parrots are now embarrassingly wrong. This article sounds like that. Whether we are actually at a dead end or not is unknown. Why are people talking with such utter conviction when nobody truly understands what's going internally with LLMs?

coolThingsFirst · 2025-04-08T04:18:40 1744085920

I know it, it was a scary period for programmers. The tide is turning. Meatsuits are back in the game.

bigyabai · 2025-04-08T04:00:11 1744084811

> The reality, reported or otherwise, is that large language models are no longer living up to expectations, and its purveyors appear to be making dodgy choices to keep that fact from becoming obvious.

What else is new?

（评论） (comments)

（评论）
(comments)