(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43498338

Hacker News 上的一篇讨论线程对大型语言模型(LLM)的质疑,起因于 Twitter 上的一篇批评性帖子。评论者们根据自身经验表达了不同的观点。 一些人发现 LLM 在特定任务上很有用,例如脚本编写、代码生成、CSV 操作和自动化 shell 命令,从而提高了效率。另一些人则认为,目前的估值要求 LLM 在更广泛的角色中取代人类,而关注狭隘的成功则忽略了重大的失败。 一种反驳意见认为,“聊天机器人”的方法并非最佳方法,LLM 在语音转录等领域表现出色,尽管偶尔会出现“幻觉”。人们强调了其作为变革性辅助技术的潜力。 一位评论者指出,请求检查生成的 URL 是否有效是一个比人们意识到的更难的任务,因为它可能会增加相当大的开销和延迟。 总的来说,该线程反映了对 LLM 当前能力和未来潜力的争论,观点因个人的用例和期望而异。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
I genuinely don't understand why some people are still bullish about LLMs (twitter.com/skdh)
41 points by ksec 40 minutes ago | hide | past | favorite | 10 comments










My experience (almost exclusively Claude), has just been so different that I don't know what to say. Some of the examples are the kinds of things I explicitly wouldn't expect LLMs to be particularly good at so I wouldn't use them for, and others, she says that it just doesn't work for her, and that experience is just so different than mine that I don't know how to respond.

I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).

A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.

LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.

Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.

-edit- Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"



The point is that given the current valuations, being good at a bunch of narrow use cases is just not good enough. It needs to be able to replace humans in every role where the primary output is text or speech to meet expectations.


You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.

Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.

But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.

It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.



> You're using them wrong. Everyone is though I can't fault you specifically.

If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle trajectory on now. If this is the wrong use case, everything the author said is still true.

There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.



Because it’s not a scientific research tool, it’s a most likely next text generator. It doesn’t keep a database of ingested information with source URLs. There are plenty of scientific research tools but something that just outputs text based on your input is no good for it.

I’m sure that in the future there will be a really good search tool that utilises an LLM but for now a plain model just isn’t designed for that. There are a ton of other uses for them, so I don’t think that we should discount them entirely based on their ability to output citations.



My experience is starkly different. Today I used LLMs to:

1. Write python code for a new type of loss function I was considering

2. Perform lots of annoying CSV munging ("split this CSV into 4 equal parts", "convert paths in this column into absolute paths", "combine these and then split into 4 distinct subsets based on this field.." - they're great for that)

3. Expedite some basic shell operations like "generate softlinks for 100 randomly selected files in this directory"

4. Generate some summary plots of the data in the files I was working with

5. Not to mention extensive use in Cursor & GH Copilot

The tool (Claude 3.7 mostly, integrated with my shell so it can execute shell commands and run python locally) worked great in all cases. Yes I could've done most of it myself, but I personally hate CSV munging and bulk file manipulations and its super nice to delegate that stuff to an LLM agent

edit: formatting



These seem like fine use cases: trivial boilerplate stuff you’d otherwise have to search for and then munge to fit your exact need. An LLM can often do both steps for you. If it doesn’t work, you’ll know immediately and you can probably figure out whether it’s a quick fix or if the LLM is completely off-base.


How did you integrate Claude into your shell


We've had the opposite experience, especially with o3-mini using Deep Research for market research & topic deep-dive tasks. The sources that are pulled have never been 404 for us, and typically have been highly relevant to the search prompt. It's been a huge time-saver. We are just scratching the surface of how good these LLMs will become at research tasks.


People who don't work in tech have no idea how hard it is to do certain things at scale. Skilled tech people are severely underappreciated.

From a sub-tweet:

>> no LLM should ever output a url that gives a 404 error. How hard can it be?

As a developer, I'm just imagining a server having to call up all the URLs to check that they still exist (and the extra costs/latency incurred there)... And if any URLs are missing, getting the AI to re-generate a different variant of the response, until you find one which does not contain the missing links.

And no, you can't do it from the client side either... It would just be confusing if you removed invalid URLs from the middle of the AI's sentence without re-generating the sentence.

All this stuff adds complexity, cost, latency.







Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com