I genuinely don't understand why some people are still bullish about LLMs

MostlyStable · 2025-03-27T21:53:03 1743112383

My experience (almost exclusively Claude), has just been so different that I don't know what to say. Some of the examples are the kinds of things I explicitly wouldn't expect LLMs to be particularly good at so I wouldn't use them for, and others, she says that it just doesn't work for her, and that experience is just so different than mine that I don't know how to respond.

I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).

A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.

LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.

Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.

-edit- Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"

CrossVR · 2025-03-27T21:58:51 1743112731

The point is that given the current valuations, being good at a bunch of narrow use cases is just not good enough. It needs to be able to replace humans in every role where the primary output is text or speech to meet expectations.

retrac · 2025-03-27T21:44:59 1743111899

You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.

Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.

But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.

It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.

Shank · 2025-03-27T21:54:36 1743112476

> You're using them wrong. Everyone is though I can't fault you specifically.

If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle trajectory on now. If this is the wrong use case, everything the author said is still true.

There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.

joegibbs · 2025-03-27T21:59:23 1743112763

Because it’s not a scientific research tool, it’s a most likely next text generator. It doesn’t keep a database of ingested information with source URLs. There are plenty of scientific research tools but something that just outputs text based on your input is no good for it.

I’m sure that in the future there will be a really good search tool that utilises an LLM but for now a plain model just isn’t designed for that. There are a ton of other uses for them, so I don’t think that we should discount them entirely based on their ability to output citations.

latemedium · 2025-03-27T21:53:05 1743112385

My experience is starkly different. Today I used LLMs to:

1. Write python code for a new type of loss function I was considering

2. Perform lots of annoying CSV munging ("split this CSV into 4 equal parts", "convert paths in this column into absolute paths", "combine these and then split into 4 distinct subsets based on this field.." - they're great for that)

3. Expedite some basic shell operations like "generate softlinks for 100 randomly selected files in this directory"

4. Generate some summary plots of the data in the files I was working with

5. Not to mention extensive use in Cursor & GH Copilot

The tool (Claude 3.7 mostly, integrated with my shell so it can execute shell commands and run python locally) worked great in all cases. Yes I could've done most of it myself, but I personally hate CSV munging and bulk file manipulations and its super nice to delegate that stuff to an LLM agent

edit: formatting

kwertyoowiyop · 2025-03-27T22:02:50 1743112970

These seem like fine use cases: trivial boilerplate stuff you’d otherwise have to search for and then munge to fit your exact need. An LLM can often do both steps for you. If it doesn’t work, you’ll know immediately and you can probably figure out whether it’s a quick fix or if the LLM is completely off-base.

mnky9800n · 2025-03-27T22:01:10 1743112870

How did you integrate Claude into your shell

encypherai · 2025-03-27T21:42:57 1743111777

We've had the opposite experience, especially with o3-mini using Deep Research for market research & topic deep-dive tasks. The sources that are pulled have never been 404 for us, and typically have been highly relevant to the search prompt. It's been a huge time-saver. We are just scratching the surface of how good these LLMs will become at research tasks.

jongjong · 2025-03-27T21:58:10 1743112690

People who don't work in tech have no idea how hard it is to do certain things at scale. Skilled tech people are severely underappreciated.

From a sub-tweet:

>> no LLM should ever output a url that gives a 404 error. How hard can it be?

As a developer, I'm just imagining a server having to call up all the URLs to check that they still exist (and the extra costs/latency incurred there)... And if any URLs are missing, getting the AI to re-generate a different variant of the response, until you find one which does not contain the missing links.

And no, you can't do it from the client side either... It would just be confusing if you removed invalid URLs from the middle of the AI's sentence without re-generating the sentence.

All this stuff adds complexity, cost, latency.

（评论） (comments)

（评论）
(comments)