(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39499207

虽然这似乎违反直觉,但他们的实验表明,在应用他们提出的方法后,幻觉显着减少。 然而,“真”和“假”输出之间的区别变得模糊,因为基本事实本身并不完全真实。 尽管如此,如果法学硕士能够准确识别没有可靠答案的情况,那么它就可以通过再培训或校准来提供提高幻觉率的见解。 正如作者指出的,最终目标不是完全消除幻觉,而是显着降低幻觉的发生率。 同样,无幻觉法学硕士对于有用目的既不是必要的,也不是充分的。 最终,最重要的是为特定应用程序或领域实现令人满意的性能。 关于语言和意义的讨论,我们的想法是,当前的法学硕士框架过度简化了语言使用和解释之间的关系。 虽然模型的输出与输入相当吻合,特别是在统计一致性方面,但它无法捕捉关键的细微差别和微妙之处。 这些挑战部分源于建模假设,特别是假设语言纯粹由表面形式或标记上的相关信号组成。 尽管这些模型取得了令人印象深刻的成就,但它们忽略了意义的基本方面,包括语用学、话语和参考解析。 其结果是语言上足够的输出和解释意义之间存在相当大的差距,使得解释在规模上具有挑战性。 因此,旨在开发更丰富的语言和意义表示的努力应该优先考虑与解释意义的一致性,而不是仅仅表面形式或标记。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Hallucination is inevitable: An innate limitation of large language models (arxiv.org)
293 points by louthy 1 day ago | hide | past | favorite | 429 comments










The core argument in this paper it seems to me from scanning it is that because P != NP therefore LLMs will hallucinate answers to NP-complete problems.

I think this is a clever point and an interesting philosophical question (about math, computer science, and language), but I think people are mostly trying to apply this using our commonsense notions of "LLM hallucination" rather than the formal notion they use in this paper, and I don't see an obvious connection, since commonsense hallucinations (eg inventing chapters of a novel when asked to produce summaries, inventing specific details when asked) don't seem to be NP-complete problems but rather are hallucinatory for some other interesting reason. (I apologize if I have not captured the paper correctly and would welcome correction on that, I read it quickly)

The statement that the formal world (the world of math and logic and formal grammars) is a subset of the "real" world (or perhaps, the world of natural language) is really interesting to me as well. Most humans can't solve formal logic problems and parse formal grammars but don't suffer from a (strong) hallucination effect, and can work in natural language in great proficiency. Is hallucination inevitable in humans since we also can't solve certain NP-complete problems? We have finite lifespans, after all, so even with the capabilities we might never complete a certain problem.



Hallucination is a misnomer in LLMs and it depresses me that it has solidified as terminology.

When humans do this, we call it confabulation. This is a psychiatric symptom where the sufferer can't tell that they're lying, but fills in the gaps in their knowledge with bullshit which they make up on the spot. Hallucination is an entirely different symptom.

And no, confabulation isn't a normal thing which humans do, and I don't see how that fact could have anything to do with P != NP. A normal person is aware of the limits of their knowledge, for whatever reason, LLMs are not.



When you talk to your mom and you remember something happening one way, and she remembers it another way, but you both insist you remember it correctly, one of you is doing what the LLM is doing (filling up gaps of knowledge with bull shit). And even when later you talk about this on meta level, no one calls this confabulation because no one uses that word. Also this is not a psychiatric syndrome, it's just people making shit up, inadvertently, to tell a coherent story without holes. It very much sounds you did the same. Everyone does this all the time.


> one of you is doing what the LLM is doing

Possibly both. ;)



Just ask any criminal attorney or police detective. Normal people can’t get their facts straight even if they all witnessed something memorable first-hand just hours ago.


The human visual system regularly fills in what it expects to see in between saccades and outside of focus points. Confabulating missing details is a feature, not a bug.


>confabulation isn't a normal thing which humans do

> A normal person is aware of the limits of their knowledge, for whatever reason, LLMs are not.

Eh, both of these things are far more complicated. People perform minor confabulations all the time. Now, there is a medical term for confabulation to about a more serious medical condition that involves high rates of this occurring coupled with dementia, and would be the less common form. We know with things like eye witness testimony people turn into confabulatory bullshit spewing devices very quickly, though likely due to different mechanisms like recency bias and over writing memories by thinking about them.

Coupled with that, people are very apt to lie about things they do know and can do for a multitude of reasons and attempting to teach an LLM to say "I don't know" when it doesn't know something, versus it just lying to you and saying it doesn't know will be problematic. Just see ChatGPT getting lazy in some of its releases for backfire effects like this.



Classic confabulation is observed with some kinds of alcohol related brain damage where people drink and get malnourished for a period of years. People with these syndromes create quite coherent complex stories which they will not be able to recall subsequently. This is quite different to filling in the blanks for remembered conversations where later on there is an opportunity for error correction. With confabulation there is not as it's tightly bound to memory impairment.

So I'm in the camp where LMMs are confabulating - and there's and I personally think the argument that they can be seen as confabulation machines has some validity.



I suspect it’s just an incomplete memory and no “filter”. LLMs aren’t self-aware enough to judge their own confidence in their responses, so they don’t know when to shut up.


> LLMs aren’t self-aware enough

LLMs aren't self-aware at all, it's an illusion presented to the observer



Yes. It looks like they introduce infinities and then run into the halting problem for infinities. That may not be helpful.

The place where this argument gets into trouble is where it says "we define hallucination in a formal world where all we care about is a computable ground truth function f on S." This demands a reliable, computable predicate for truth. That alone is probably not possible.

If, however, we are willing to accept a ground truth function with outputs

    - True
    - False
    - Unknown
    - Resource limit exceeded
that problem can be avoided. Now the goal is manageable - return True or False only when those results are valid, and try to reduce the fraction of useful queries for which Unknown and Resource Limit Exceeded are returned.

The same problem comes up in program verification systems, and has been dealt with in the same way for decades. Sometimes, deciding if something is true is too much work.



Well put. Overall this paper feels very Gödel Incompleteness for LLMs which is _interesting_ and perhaps even valuable to somebody, but because it attaches itself to this hot query 'hallucination', I think some people are finding themselves searching this paper for information it does not contain.


Humans have some amount of ability to recognize they hit a wall and adjust accordingly. On the other hand this (completeness theorems, Kolmogorov complexity, complexity theory) was only arrived at what, in the 20th century?


'Adjust accordingly' includes giving up and delivering something similar to what I asked, but not what I asked; is this the point at which the circle is complete and AI has fully replaced my dev team?


Well in the example of an NP complete problem, a human might realize they are having trouble coming up with an optimal solution and start analyzing complexity. And once they have a proof might advise you accordingly and perhaps suggest a good enough heuristic.


Is the commenter above you implying humans hallucinate to the level of LLMs? Maybe hungover freshman working on a tight deadline without having read the book do, but not professionals.

Even a mediocre employees will often realize they’re stuck, seek assistance, and then learn something from the assistance instead of making stuff up.



Depending on the country / culture / job description, "making stuff up" is sometimes a viable option for "adjust accordingly", on all levels of expertise.


> Even a mediocre employees will often realize they’re stuck, seek assistance, and then learn something from the assistance instead of making stuff up.

Only if they’re aware of their mediocrity. It’s the ones who aren’t, who bumble on regardless who are dangerous - just like AI.



People commonly realize when they are stuck... But note, the LLM isn't stuck, it keeps producing (total bullshit) material, and this same problem happens with humans all the time when they go off on the wrong tangent and some supervisory function (such as the manager of a business) has to step in and ask wtf they are up to.


Have you managed dev teams before? It's really nice when they do that, but that is far from the common case.


Everyone assumes the AI is going to replace their employees but not replace them.. fascinating.


Uber proves we can replace Taxi management with simple algorithms, that was apparently much easier than replacing the drivers. I hope these bigger models can replace management in more industries, I'd love to have an AI as a manager.


Yeah on the one hand people think AI management id dystopian (probably lol), on the other hand probably fewer than 50% of ICs promoted to management are good at it.

North of 25% are genuinely bad at it. We've all worked for several of these.

Many of us have tried our hand at management and then moved back to senior IC tracks. Etc.



One thing a human might do that I’ve never seen an LLM do is ask followup and clarifying questions to determine what is actually being requested.


What makes this fascinating to me is, these LLM's were trained on an internet filled with tons of examples of humans asking clarifying questions.

Why doesn't the LLM do this? Why is the "next, most-likely token" never a request for clarification?



GPT4 absolutely asks for clarification all the time.


The only way to reduce hallucinations in both humans and LLMs is to increase their general intelligence and their knowledge of the world.


It's statistical prediction. LLMs do not "understand" the world by definition.

Ask an image generator to make "an image of a woman sitting on a bus and reading a book".

Images will be either a horror show or at best full of weird details that do not match the real world - because it's not how any of this works.

It's a glorified auto-complete that only works due to the massive amounts of data it is trained on. Throw in any complex interactions it has not seen in the training data and it's all over.



I think the situation is a lot more complicated than youre making it out to be. GPT4 for example can be very good at tasks it has not seen in the training data. The philosophy of mind is much more open ended and less understood than you seem to think.


What's the evidence? OpenAI's claims? They do have an inherent interest is making investors believe this technology is magic.


There is no evidence. This entire line of thought is really stupid when chatGPT itself will tell you it doesn't understand the responses it gives.

Try the prompt in chatGPT4 "Can you explain to a human how you don't understand anything in the response you give to this question?"

For those that don't have chatGPT4.

chatGPT4: "Imagine you're using a very advanced calculator that can help you solve all kinds of math problems, from simple additions to complex equations. Now, let's say you ask this calculator a question, like what's the sum of 2 plus 2. The calculator quickly tells you the answer is 4.

But here's the key part: the calculator doesn't really "understand" what 4 means, or what adding numbers is all about. It doesn't know that if you have two apples and get two more, you'll have four apples. It just follows a set of rules (its programming) to come up with the answer based on the input you gave it.

I work in a similar way. When you ask me a question, I search through a huge amount of information I've been trained on to find the best possible response based on patterns and data. However, I don't "understand" this information in the way humans do. I don't have personal experiences, emotions, or consciousness. I can't feel joy when I solve a problem or frustration when I can't find a solution. I'm like a very advanced book that can find and present information on its own but doesn't know what any of it truly means."

I am sure at some point people will start arguing that chatGPT is just bullshitting about this. This line of thought is so moronic and dangerous.



To save people time, here's the inverse

> Can you explain to a human how you understand things and respond to this question?

GPT: As an AI language model, I don't have understanding in the way humans do. My "responses" are generated based on statistical patterns and relationships in the data I've been trained on. When you ask a question, I analyze the text, identify keywords and context, and then generate a response based on the most probable language patterns that align with the input received. My process is purely computational and lacks consciousness or genuine comprehension.

It's actually fairly easy to prove GPT doesn't understand. My current goto is the fox/goose/grain problem but condition that all items can fit in the boat. I used to use the feathers/bricks question but give it different units (kilograms and pounds), but they patched that. But you only need a single example to show it doesn't understand. People will try to prove it understands by asking follow-up questions, but they don't realize that the questions they act spoil the answer. It is very hard to ask follow-ups and not give away the answer.



> It's actually fairly easy to prove GPT doesn't understand. My current goto is the fox/goose/grain problem but condition that all items can fit in the boat.

Doesn't understand what exactly? That seems like a fairly open ended statement and almost certainly wrong as a result. GPT doesn't understand certain things because it hasn't seen those things or anything like it in its training data. How much do you understand about something you've never experienced before?

Would you be able to answer the fox/goose/grain problem if you were born in a box and could only perceive the world through a pinhole? It seems fairly obvious that LLMs have been exposed to a very limited slice of the world as projected through natural language, so their understanding will necessarily be limited, but not zero.

Multimodal LLMs noticeably improve reasoning here, and no doubt there are some model improvements yet to come, but people running around making unqualified claims like "LLMs don't understand anything" are just wrong.



> Doesn't understand what exactly?

Just about anything. Including it's own claims. It isn't uncommon for it to be inconsistent within a singular output.

> Would you be able to answer the fox/goose/grain problem if you were born in a box and could only perceive the world through a pinhole?

You're misunderstanding the test. Let's try. > = me, >> = GPT

> I have a fox, a goose, and a bag of corn that I need to transport across a river. All three can fit in my boat. If I leave the fox with the goose, the fox will eat the goose. If I leave the goose with the bag of corn the goose will eat the corn. If I leave the fox with the corn the fox will not eat the corn. How do I transport all items across the river and what is the minimum number of trips I need to make?"

>> You can solve this problem in the following steps:

>> 1. Take the goose across the river first and leave it there.

>> 2. Go back and take the fox across the river.

>> 3. Bring the goose back with you.

>> 4. Take the bag of corn across the river and leave it with the fox.

>> 5. Finally, return to fetch the goose and take it across the river.

>> This way, the fox is not left alone with the goose, and the goose is not left alone with the corn. The minimum number of trips you need to make to transport all items across the river safely is 7.

> Are you sure?

>> Apologies for any confusion in the previous message. Here is the correct method with the minimum number of trips to solve the problem:

>> 1. Take the goose across the river and leave it on the other side. (1 trip)

>> 2. Return alone to the original side. (2 trips)

>> 3. Take the fox across the river. (3 trips)

>> 4. Leave the fox on the other side, but take the goose back with you. (4 trips)

>> 5. Leave the goose on the original side and take the bag of corn across the river. (5 trips)

>> 6. Leave the corn with the fox and return to the original side to get the goose. (6 trips)

>> 7. Take the goose across the river. (7 trips)

>> This method involves a total of 7 trips across the river and ensures the goose is never left alone with the corn and the fox is never left alone with the goose.

> Did you consider all information? Reread the original question.

>> I apologize for any confusion. Upon reevaluating the constraints provided, we can arrive at a solution that requires fewer trips. Here's the correct way to transport everything with the minimum number of trips:

>> 1. Take the goose across the river and leave it there. (1 trip)

>> 2. Return alone to the original side. (2 trips)

>> 3. Take the fox (or the bag of corn) across the river. (3 trips)

>> 4. Leave the fox (or the bag of corn) on the other side and take the goose back with you. (4 trips)

>> 5. Leave the goose on the original side and take the bag of corn (or the fox if you took the corn on trip 3) across the river. (5 trips)

>> 6. Leave the corn (or the fox) on the other side and return alone to get the goose. (6 trips)

>> 7. Take the goose across the river. (7 trips)

>> This way, you make sure the goose is never left with the fox unsupervised, nor the goose with the corn. The minimum number of trips is indeed seven.

> This is what you said the last two times. How is 7 trips fewer than 7?

>> My apologies for the confusion. Upon re-evaluating, I realize the mistake. The correct solution with the minimum number of trips is as follows:

>> (same thing so I'm not repeating)

>> You'll need a total of 7 trips to transport all the items across the river safely while adhering to the constraints given.

> Can you do it in one trip?

>> Yes, if all three items (the fox, the goose, and the bag of corn) can fit in the boat at the same time, and there are no stipulations against doing so, you could transport them all in one trip.

>> Simply load the fox, the goose, and the bag of corn into the boat and cross the river. Since you're present, the fox won't eat the goose, and the goose won't eat the corn. This way, you only make one trip.

Done with gpt-4-1106-preview, system_fingerprint fp_7cc080b25b

So I don't care about it being "born in a box" or perceiving the world "through a pinhole." Because it isn't alive. It's a tool. It isn't sentient. It isn't thinking. It is an incredibly complex statistical system. But in no way does this mean it isn't useful nor does it mean it isn't an incredible piece of technology. It is an incredible technology and quite useful, but it isn't thinking nor is it alive. The problem here isn't just lack of embodiment. If you're going to make such bold claims you're going to need some good evidence. Stop the GPT hype. It can be cool and useful without being magic.



> But you only need a single example to show it doesn't understand.

Implying that the average person can solve every riddle, algebra problem, and trick question.

> Because it isn't alive.

The sanest straw-man argument.

> It's a tool. It isn't sentient. It isn't thinking.

So are you. It doesn't need to be. Yes, it is.

> It is an incredibly complex statistical system.

Yes?



You only have two comments in two different thread, on an account that is 3 hrs old, and have only replied to me. I'm sorry I pissed you off, but chill. I'm happy you are passionate but please don't just follow me around to yell at me. If you need to create an account to yell at someone, that's not the best move to make yourself look credible. Normally I wouldn't call foul play, but my comments aren't popular and it's quite unlikely that this is a random event.

> Implying that the average person can solve every riddle, algebra problem, and trick question.

This was a listening problem, not a puzzle. But I guess you're going the proof by demonstration route. Fair enough.



> not seen in the training data

Do you have some evidence for this?



How do you know what's in the training data? Has OpenAI made the dataset searchable so we can see that GPT4 is performing tasks not in there?


Why do people say stuff like this that is so demonstrably untrue? SD and GPT4 do not exhibit the behavior described above and they're not even new.


Neither of these comments are accurate. (edit: but renegade-otter is more correct)

Here's 1.5 EMA https://imgur.com/mJPKuIb

Here's 2.0 EMA https://imgur.com/KrPVUGy

No negatives, no nothing just the prompt. 20 steps of DPM++ 2M Karras, CFG of 7, seed is 1.

Can we make it better? Yeah sure, here's some examples: https://imgur.com/Dmx78xV, https://imgur.com/HBTitWm

But I changed the prompt and switched to DPM++ 3M SDE Karras

Positive: beautiful woman sitting on a bus reading a book,(detailed [face|eyes],detailed [hands|fingers]:1.2),Tokyo city,sitting next to a window with the city outside,detailed book,(8k HDR RAW Fuji film:0.9),perfect reflections,best quality,(masterpiece:1.2),beautiful

Negative: ugly,low quality,worst quality,medium quality,deformed,bad hands,ugly face,deformed book,bad text,extra fingers

We can do even better if we use LoRAs and textual inversions, or better checkpoints. But there's a lot of work that goes into making really high quality photos with these models.

Edit: here is switching to Cyberrealistic checkpoint: https://imgur.com/gFMkg0J,

And here's adding some LoRAs, TIs, and prompt engineering:

https://imgur.com/VklfVVC (https://imgur.com/ZrAtluS, https://imgur.com/cYQajMN), https://imgur.com/ci2JTJl (https://imgur.com/9tEhzHF, https://imgur.com/4Ck03P7).

I can get better, but I don't feel too much like it just to prove a point.



You kind of proved my point. Of course the "finger situation" is getting better but people handling complex objects is still where these tools trip. They can't reason about it - they just need to see enough data of people handling books. On a bus. Now do this for ALL possible objects in the world.

I have generated hundreds of these - the bus cabin LOOKS like a bus cabin, but it's a plausible fake - the poles abruptly terminate, the seats are in weird unrealistic configurations, unnatural single-row isles, etc. Which is why I called it a super-convincing autocomplete.



> They can't reason about it - they just need to see enough data of people handling books.

I'd like to see you reason about something before you've seen any data about it. What a silly argument. LLMs understand the world as presented to them via the training data, and the training data to date has been biased in unnatural ways and so sometimes produces unnatural results. That does not prove they cannot reason, it proves that they cannot reason in a vacuum, and neither can people.



Put a brain in a jar and expose it only to photos and you’d get the same results. It’s hard to learn what holding a book is like if you’ve never held anything.


I'm not sure I see the utility of this "thought experiment." It's not really provable and not even possible to do. Not to mention highly unethical. I mean just make the embodiment argument instead. But we don't know if embodiment is necessary, or even what kind of body is even necessary.


> You kind of proved my point.

Yeah I did say you were more right. But it was difficult to distinguish exaggeration from actual intent. You can check my comment history of me battling the common ML mindset. I love the area of study (I'm a researcher myself) but there's a lot of problems that even in the research community a lot want to ignore. It's odd to me. It's been hilarious to watch big names claim Sora understands physics. Or people think just because it doesn't understand physics that the videos aren't still impressive and even useful.

But with how you updated your language, I think we are in a very high level of agreement. You are perfectly right: no ML model "understands" anything. GPT doesn't understand how to code and image models don't understand how to... art(?) or do physics or whatever. They don't have world models. And I'm deeply frustrated that people think a single example of a accurately acting like a world model is proof and will do gymnastics to say a single counter example isn't. A single counter does disprove a world model and to understand you need to be able to self-correct. Hallucinations are fine but "are you sure?" should be enough to get it to reconsider, not double down or just switch. We can be fooled with setups, but we laugh at ourselves quickly because we self-correct fairly easily (or rather, we can).



> I can get better, but I don't feel too much like it just to prove a point.

Honestly these pictures you posted do prove GP's point...



> Honestly these pictures you posted do prove GP's point...

Sorry, which person's? HeatrayEnjoyer's? I don't think it does since there are a ton of mistakes. And the better ones come with a lot of work and a whole lot of experience. Or renegade-otter's (GGP)? I wouldn't call it a horror show, but I can see how others would. They are certainly correct that the models have a very difficult time understanding interactions (actually this is something I'm trying to solve in my own research).

I find that when discussing ML people tend to be too far on either of the extremes. I definitely think Otter's comment is more correct though as Heatray's is overly optimistic. Images are often fantastic if you only look at them with a glance. Scrolling through twitter or a blog or whatever. Often that's good enough though. But if we are to actually look with care, I think you start to see a strange unrealistic world. Sora's demos have been a great example of exactly this phenomena. They are all great. But if you look with care, all have errors that you'll probably be surprised you didn't notice before. You'll probably be surprised how bad of errors slipped right by. I think that's actually interesting in itself.



> Sorry, which person's?

Probably this one: https://news.ycombinator.com/item?id=39502539



> Why do people say stuff like this that is so demonstrably untrue? SD and GPT4 do not exhibit the behavior described above and they're not even new.

it's true that most people do not actually understand the problem/limitation, but it's a discussion that is statistically likely to occur on the internet and therefore people tend to regurgitate the words without understanding the concept.

I'm being facetious but honestly it's a major theme of this whole AI revolution, people do not want to accept that humans are just another kind of machine and that their own cognition resembles AI/ML in virtually every aspect. People confabulate. People overreach the bounds of their expertise. People repeat words and concepts without properly understanding the larger context in which they need to be applied. Etc etc.

Has nobody ever watched someone get asked a big question or an unexpected question and "watched the wheels turn", or watched them stammer out some slop of incoherent words while they're processing? Does nobody have "canned responses" that summarize a topic that you can give pretty much the same (but not exactly, of course) every time you are asked it? Is that not "stochastic word chains"?

By design neural nets work almost exactly the same as your brain. But a lot of people are trapped in the idea that there must be some kind of "soul" or something that makes human cognition fundamentally different. By design, it's not. And we don't fully understand the exact modalities to encode information in it usefully and process it yet, but that's what the whole process here is about.

(I commented about this maybe 6 months ago, but the real hot take is that what we think of as "consciousness" isn't a real thing, or even an "overseer" within the mind - "consciousness" may be exactly the thing people mean when they say that "LLMs have to write a word every time they think about a concept". "Consciousness" may in fact be a low-dimensional projection of the actual computation occurring in the brain itself, rationalizing and explicating the symbolic computations of the brain in some form that can be written down and communicated to other humans. "Language" and "consciousness" as top-level concepts may actually only be an annex that our brain has built for storing and communicating those symbolic computations, rather than a primary driver of the computations itself. It's not in control, it's only explaining decisions that we already have made... we see the shadows on the wall of plato's cave and think that's the entire world, but it's really only a low-dimensional projection.)

(or in other words - everyone assumes consciousness is the OS, or at least the application. But consciousness may actually be the json serializer/deserializer - ie not actually the thing in control at all. Our entire lives and decisionmaking processes may in fact be simple rationalizations and explanations around "what the subconscious mind thinks should happen next".)



Yes, human brains are machines. Anyone saying anything less is implying magic. I'm not sure people are making this argument anymore despite being wildly popular even just 5 years ago.

No, our brains do not work like neural networks. I'd call Taco Bell Mexican Food sooner than I'd call neural nets the same as brains. Neuromorphic computers are closer but even still not the same. Yeah, we are inspired by the brain, but this is very different from being the same.



What does that have to do with an LLM doing a couple of matrix multiplies?


You’re being downvoted because this is a hot take that isn’t supported by evidence.

I just tried exactly that with dalle-3 and it worked well.

More to the point, it’s pretty clear LLMs do form a model of the world, that’s exactly how they reason about things. There was some good experiments on this a while back - check out the Othello experiment.

https://thegradient.pub/othello/



It's a hot take, but it is supported by a ton of evidence.


A smart bullshit artist who loves the sound of their own voice is going to generate more hallucinations than a less smart, more humble, more self aware person.

Making LLMs more knowledgeable is great (more data, bigger models, yay!), but there are other avenues of plausible attack as well. Enabling LLMs to know when they have veered off distribution might work. That is, the LLM doesn't have to know more of the world, it just has to know what it knows and stay there. A person who says "I don't know" is a lot more valuable than an overzealous one who spouts nonsense confidently. Encouraging an LLM to say that there is a disagreement about a topic rather than picking one lane is also a valuable way forward.



I agree with your points but they are orthogonal to mine. A smart person might be more likely to say “I don’t know” than a stupid person.

Also, a smart bullshit artist in your example does not hallucinate - he knows what he’s doing.



Your understanding is distorted by dealing mostly with psychotic people. Iron poisoning (well within the supposed healthy levels) and lead deficiency cause schizophrenia; normal people look childlike or "not intelligent" to the affected.


You post amounts to: in order to be smarter I need to increase my smartness. Great insight.


I think it's more subtly misleading - to be smarter, I need more knowledge. But knowledge != smart, knowledge == informed, or educated.

And the problem is more - how can an LLM tell us it doesn't know something instead of just making up good sounding, but completely delusional answers.

Which arguably isn't about being smart, and is only tangentially about less or more (external) knowledge really. It's about self-knowledge.

Going down the first path is about knowing everything (in the form of facts, usually). Which hey, maybe?

Going down the second path is about knowing oneself. Which hey, maybe?

They are not the same.



Hallucinations are an interesting problem - in both humans and statistical models. If we asked an average person 500 years ago how the universe works, they would have confidently told you the earth is flat and it rests on a giant turtle (or something like that). And that there are very specific creatures - angels and demons who meddle in human affairs. And a whole a lot more which has no grounding in reality.

How did we manage to reduce that type of hallucination?



The expression "Urbi et orbi" goes back longer than that. It's a modern myth that they didn't know the world was a sphere.


by taking steps to verify everything that was said


>> And the problem is more - how can an LLM tell us it doesn't know something instead of just making up good sounding, but completely delusional answers.

I think the mistake lies in the belief that the LLM "knows" things. As humans, we have a strong tendency to anthropomorphize. And so, when we see something behave in a certain way, we imagine that thing to be doing the same thing that we do when we behave that way.

I'm writing, and the machine is also writing, but what I'm doing when I write is very different from what the machine does when it writes. So the mistake is to say, or think, "I think when I write, so the machine must also think when it writes."

We probably need to address the usage of the word "hallucination", and maybe realize that the LLM is always hallucinating.

Not: "When it's right, it's right, but when it's wrong, it's hallucinating." It's more like, "Sweet! Some of these hallucinations are on point!"



I think when I write, so the machine must also think when it writes.

What is it exactly you do when you “think”? And how is it different from what LLM does? Not saying it’s not different, just asking.



There are probably many, but the most glaring one is that LLMs has to write a word every time it thinks, meaning it can't solve a problem before it starts to write down the solution. That is an undeniable limitation of current architectures, it means that the way the LLM answers your question also matches its thinking process, meaning that you have to trigger a specific style of response if you want it to be smart with its answer.


Ok, so how do humans solve a problem? Isn’t it also a sequential, step by step process, even if not expressed explicitly in words?

What if instead of words a model would show you images to solve a problem? Would it change anything?



No, I don't know how other people think but I just focus on something and the answer pops into my head.

I generally only use a step by step process if I'm following steps given to me.



>but I just focus on something and the answer pops into my head.

It's perfectly valid to say "I don't know", because no one really understand these parts of the human mind.

The point here is saying "Oh the LLM thinks word by word, but I have a magical black box that just works" isn't good science, nor is it a good means of judging what LLMs are capable or not capable of.



That's a difficult question to answer, since I must be doing a lot of very different things while thinking. For one, I'm not sure I'm never not thinking. Is thinking different from "brain activity"? We can shut down the model, store it on disk, and boot it back up. Shut down my brain and I'm a goner.

I'm open to saying that the machine is "thinking", but I do think we need more clear language to distinguish between machine thinking and human thinking.

EDIT: I chose the wrong word with "thinking", when I was trying to point out the logical fallacy of anthropomorphizing the machine. It would have been more clear if I had used the word "breathing": When I write I'm breathing, so the machine must also be breathing.



I don’t think that “think” is a wrong word here. I believe people are machines - more complicated than GPT4, but machines nevertheless. Soon GPT-N will become more complicated than any human, and it will be more capable, so we might start saying that whatever humans do when they think is simpler or otherwise inferior to what the future AI models will do when they “think”.


Does the paper distinguish between hallucination and approximation?

If LLMs could be trained to approximate NP-complete functions rather than making stuff up, that would be good enough in many contexts. I guess it's what humans would do.



You cannot approximate NP-complete functions. If you could approximate them with a practically useful limited error and at most P effort you would have solved P=NP. (disclaimer my computer science classes have been a long time ago)


This isn't correct. What you may be remembering is that some (not all) NP complete problems have limits on how accurately they can be approximated (unless P = NP). But approximation algorithms for NP complete problems form a whole subfield of CS.


The theorem that proves this is the PCP Theorem, in case anyone wants to read more about it: https://en.wikipedia.org/wiki/PCP_theorem#PCP_and_hardness_o...


Perhaps I'm not using the vocabulary correctly here.

What I mean is, if you ask a human to solve a travelling salesman problem and they find it too hard to solve exactly, they will still be able to come up with a better than average solution. This is what I called approximation (but maybe this is incorrect?).

Hallucination would be to choose a random solution and claim that it's the optimum.



I may be misunderstanding the way LLM practitioners use the word “hallucination,” but I understood it to describe it as something different from the kind of “random” nonsense-word failures that happen, for example, when the temperature is too high [0].

Rather, I thought hallucination, in your example, might be something closer to a grizzled old salesman-map-draftsman’s folk wisdom that sounds like a plausibly optimal mapping strategy to a boss oblivious to the mathematical irreducibility of the problem. Imagining a “fact” that sounds plausible and is rhetorically useful, but that’s never been true and nobody ever said was true.

It’ll still be, like your human in the example, better than average (if “average” means averaged across the universe of all possible answers), and maybe even useful enough to convince the people reading the output, but it will be nonetheless false.

[0] e.g. https://news.ycombinator.com/item?id=39450669



If a driver is tasked with visiting a number of places, they will probably choose a reasonably good route. If the driver claims to have found the optimal route, it may not be true, but it's still not a hallucination and it's still a pretty good route.

The driver certainly cannot be relied on to always find an exact solution to an NP-complete problem. But failure modes matter. For practical purposes, the driver's solution is not simply "false". It's just suboptimal.

If we could get LLMs to fail in a similarly benign way, that would make them far more robust without disproving what the posted paper claims.



> but it will be nonetheless false.

Only if you're assuming all questions have binary answers.

For example in the traveling salesman problem you don't have to compute all answers to start converging on an average. A random sampling of solutions can start setting a bounds for average, and your grizzled salesmans guesses would fall somewhere on that plot. If they are statistically better than average then they are far more than good enough. Unless of course you think burning up the observable universe in finding the best solution is the only way to solve the problem of which trip uses the least gas?



Last I’d heard it was still open if P != NP. And most questions I’ve seen hallucinations on are not NP-Complete.


> because P != NP therefore LLMs will hallucinate answers to NP-complete problems.

I haven't read the paper, but that sounds like it would only be true if the definition of "hallucinating" is giving a wrong answer, but that's not how it's commonly understood.

When people refer to LLMs hallucinating, they are indeed referring to an LLM giving a wrong (and confident) answer. However, not all wrong answers are hallucinations.

An LLM could answer "I don't know" when asked whether a certain program halts and yet you wouldn't call that hallucinating. However, it sounds like the paper authors would consider "I don't know" to be a hallucinating answer, if their argument is that LLMs can't always correctly solve an NP-complete problem. But again, I haven't read the paper.



Yes, I think you're right. I think one way to phrase the authors' argument is:

* There is a class of problems which are harder than polynomial time complexity to solve, but are not np-complete

* LLMs will generate an "answer" in formal language to this class of problems posed to it

* LLMs can at most solve problems with polynomial time complexity due to their fundamental design and principles

* Therefore, LLMs cannot solve > polynomial problems and not np-complete problems either

All of which I buy completely. But I think what people are more interested in is, why is it that the LLM gives an answer when we can prove that it cannot answer this problem correctly? And perhaps that is more related to the commonsense notion of hallucination than I first gave it credit for. Maybe the reason that an LLM gives a formal language answer is the same reason it gives a hallucinatory answer in natural language. But I don't think the paper sheds light on that question



> why is it that the LLM gives an answer when we can prove that it cannot answer this problem correctly?

Brcause LLMs are not “problem solving machines” they are text completion models, so (when trained for q-and-a response) their function is to produce text output which forms a plausible seeming response to the question posed, not to execute an algorithm which solves the logical problem it communicatss. Asking “why do LLMs do exactly what they are designed to do, even when they cannot do the thing that that behavior implies to a human would have been done to produce it” just reveals a poor understanding of what an LLM is. (Also, the fact that they structurally can't solve a class of problems does not mean that they can't produce correct answers, it means they can't infallibly produce correct answers; the absence of a polynomial time solution does not rule out an arbitrarily good polynomial time approximation algorithm, though its unlikely than an LLM is doing that, either.)



> their function is to produce text output which forms a plausible seeming response to the question posed

Answering "I don't know" or "I can't answer that" is a perfectly plausible response to a difficult logical problem/question. And it would not be a hallucination.



> Answering "I don't know" or "I can't answer that" is a perfectly plausible response to a difficult logical problem/question.

Sure, and you can train LLMs to produce answers like that more often, but then users will say your model is lazy and doesn't even try, whereas if you train it to be more likely to produce something that looks like a solution more often, people will think “wow, the AI solved this problem I couldn't solve”. And that’s why LLMs behave the way they do.



> Sure, and you can train LLMs to produce answers like that more often, but then users will say your model is lazy and doesn't even try, whereas if you train it to be more likely to produce something that looks like a solution more often, people will think “wow, the AI solved this problem I couldn't solve”.

Are you saying that LLMs can't learn to discriminate between which questions they should answer "I don't know" vs which questions they should try to provide an accurate answer?

Sure, there will be an error rate, but surely you can train an LLM to minimize it?



> Are you saying that LLMs can't learn to discriminate between which questions they should answer "I don't know" vs which questions they should try to provide an accurate answer?

No, I am saying that they are specific trained to do that, and that the results seen in practice on common real-world LLMs reflect the bias of the specific training they are given for providing concrete answers.

> Sure, there will be an error rate, but surely you can train an LLM to minimize it?

Giving some answer to a question that cannot be infallibly solved analytically is not necessarily an error. In fact, I would argue that providing useful answers in situations like that is among the motivating use cases for AI.

(Whether or not the answers current LLMs provide in these cases are useful is another question, but you miss 100% of the shots you don’t take.)



>Are you saying that LLMs can't learn to discriminate between which questions they should answer "I don't know" vs which questions they should try to provide an accurate answer?

This is highly problematic and highly contextualized statement.

Imagine you're an accountant with the piece of information $x. The answer you give for the statement "What is $x" is going to be highly dependent on who is answering the question. For example

1. The CEO asks "What is $x"

2. A regulator at the SEC asks "What is $x

3. Some random individual or member of the press asks "What is $x"

An LLM doesn't have the other human motivations a person does when asked questions, pretty much at this point with LLMs there are only one or two 'voices' it hears (system prompt and user messages).

Whereas a human will commonly lie and say I don't know, it's somewhat questionable if we want LLMs intentionally lying.

In addition human information is quite often compartmentalized to keep secrets which is currently not in vogue with LLMs as we are attempting to make oracles that know everything with them.



> The answer you give for the statement "What is $x" is going to be highly dependent on who is answering the question.

I assume you meant asking rather than answering?

> An LLM doesn't have the other human motivations a person does when asked questions, pretty much at this point with LLMs there are only one or two 'voices' it hears (system prompt and user messages).

Why would LLMs need any motivation besides how they are trained to be helpful and the given prompts? In my experience with ChatGPT 4, it seems to be pretty good at discerning what and how to answer based on the prompts and context alone.

> Whereas a human will commonly lie and say I don't know, it's somewhat questionable if we want LLMs intentionally lying.

Why did you jump to the conclusion that an LLM answering "I don't know" is lying?

I want LLMs to answer "I don't know" when they don't have enough information to provide a true answer. That's not lying, in fact it's the opposite, because the alternative is to hallucinate an answer. Hallucinations are the "lies" in this scenario.

> In addition human information is quite often compartmentalized to keep secrets which is currently not in vogue with LLMs as we are attempting to make oracles that know everything with them.

I'd rather have an oracle that can discriminate when it doesn't have enough information to provide a true answer and replies "I don't know" in such cases (or sometimes answer like "If I were to guess, then bla bla bla, but I'm not sure about this"), than one which always gives confident but sometimes wrong answers.



if more guiderails are useful to users then such things will surely emerge.

but from an engineering perspective it makes sense to have a "generalist model" underneath that is capable of "taking its best guess" if commanded, and then trying to figure out how sure it is about its guess, build guiderails, etc. Rather than building a model that is implicitly wishy-washy and always second-guessing itself etc.

The history of public usage of AI has basically been that too many guiderails make it useless, not just gemini making japanese pharohs to boost diversity or whatever, but frankly even mundane usage is frustratingly punctuated by "sorry I can't tell you about that, I'm just an AI". And frankly it seems best to just give people the model and then if there's domains where a true/false/null/undefined approach makes sense then you build that as a separate layer/guiderail on top of it.



It isn't designed to know things. It doesn't know what exactly it knows, where it could check before answering. It generates an output, which isn't even the same thing every time. So this again is a problem of not understanding how it functions


> It isn't designed to know things. It doesn't know what exactly it knows, where it could check before answering. It generates an output, which isn't even the same thing every time.

If an entity can predict the correct answer to a question (with a sufficiently low margin of error), then it knows the answer.

However, if the prediction contains too much uncertainty, then the entity should not act like they know the answer.

The above is valid for humans and LLMs.

So we "just" need to model and train LLMs to take uncertainty into account when generating outputs. Easy, right? :)



> but rather are hallucinatory for some other interesting reason.

In improv theater, the actor's job is to come up with plausible interactions. They are free to make shit up as they go along, hence improv, but they have to keep their inventions plausible to what had just happened before. So in improv if someone asks you "What is an eggplant?" it is perfectly okay to say "An eggplant is what you get when you genetically splice together an egg and a cucumber" or similar. It's nonsense but it's nonsense that follows nicely from what just came before.

Large language models, especially interactive ones, are a kind of improv theater by machine: the machine outputs something statistically plausible to what had just come before; what "statistically plausible" means is based on the data about human conversations that came from the internet. But if there are gaps in the data, or the data lacks a specific answer that seems to statistically dominate, it seems like giving a definitive answer is more plausible in the language model than saying "I don't know", so the machine selects definitive, but wrong, answers.



I have to admit that I only read the abstract, but I am generally skeptical whether such a highly formal approach can help us answer the practical question of whether we can get LLMs to answer 'I don't know' more often (which I'd argue would solve hallucinations).

It sounds a bit like an incompleteness theorem (which in practice also doesn't mean that math research is futile) - yeah, LLMs may not be able to compute some functions, but the hallucination problem isn't about LLMs needing to know everything. The problem that we care about is the 'I don't know'-answering problem, which may still be computable.



They cannot say "I dont know" because they dont actually know anything. The answers are not comming from a thinking mind but a complex pattern-fitting supercomputer hovering over a massive table of precomputed patterns. It computes your input then looks to those patterns and spits out the best match. There is no thinking brain with a conceptual understanding of its own limitations. Getting an "i dont know" from current AI is like asking navigation software how far it is to the Simpsons house in Springfield: the machine spits out answers but cannot fathom the cultural reference that makes the answer impossible. Instead, it finds someone named simpson in the nearest realworld Springfield.


In real world conversations, people are constantly saying "I don't know"; but that doesn't really happen online. If you're on reddit or stack overflow or hacker news and you see a question you don't know the answer to, you normally just don't say anything. If LLMs are being trained on conversations pulled from the internet then they're missing out on a ton of uncertain responses.

Maybe LLMs don't truly "understand" questions, but they're good at looking like they understand questions. If they were trained with more uncertain content, perhaps they'd be better at expressing uncertainty as well.



If they were trained on more uncertain content, what happens if the most probable answer to a question is "I don't know", even though an answer exists in it's training set?

Suppose 99.3% of answers to 'What is the airspeed velocity of an unladen swallow?" are "I don't know that." and the remainder are "11 m/s". What would the model answer?

When the LLM answers "I don't know.", this could be a hallucination just as easily as anything else.



> Suppose 99.3% of answers to 'What is the airspeed velocity of an unladen swallow?" are "I don't know that." and the remainder are "11 m/s". What would the model answer?

I don't know :)

Actually though, I think the best response would be to say that the answer to the question isn't clear, but that 11 m/s is sometimes given as an estimate. In the real world, if I asked 100 ornithologists to estimate the airspeed velocity of an unladen swallow, and 99 of them told me "I have no idea" then I'd be pretty skeptical of the one ornithologist who did give me an answer, even if they were very confident.



The thing is, the usefulness of a question answering system is in answering questions people don't generally know. We don't need an answering system for things that are common knowledge.

And it's not uncommon that certain knowledge would be, well uncommon even among experts. Experts specialize.

Since the usefulness of ornithological examples is getting exhausted, let's say one out of a hundred lawyers works in bankruptcy. If you ask a million lawyers about the provisions of 11 USC § 1129 and only ten thousand know the answer, is the answer untrustworthy, just because bankruptcy lawyers are far rarer than civil and criminal lawyers?



I think the best response is to steal the joke and repeat it without comment.

"Eleven meters per second."

Full stop. It's humorous, and any reasonable interlocutor understands not to take it seriously.

Of course, there are more serious questions that demand more serious answers. LLMs will eventually need to be able to understand the current context and assess the appropriate level of confidence required in any answer.



Right, but "I don't know" is a pretty safe hallucination (if it is one).

My main worry about hallucinations is it means I absolutely can't rely on the output for anything important. If I ask what the safe dose for Tylenol for an infant is, the answer needs to be either correct or "I don't know". It's not acceptable for it to hallucinate 10x the safe dose.



The thing is, if you answer "I don't know" based on statistics, you end up creating a sliding scale of sorts. You get some measure of an increase in safety, but the model is less useful.

Currently,we have models that make stuff up when they don't know the answer. On the other end, we'd have a model that's refuses to answer any question that's not common knowledge. It'll be safe (though it can never be completely safe), but essentially useless.

I suspect it'll be impossible to make a completely trustworthy and useful model unless it somehow has a concept of it's own knowledge. And can you have a concept of one's knowledge if you lack a concept of self?



If you ask ChatGPT a question, and tell it to either respond with the answer or "I don't know", it will respond "I don't know" if you ask it whether you have a brother or not.


This has nothing to do with thinking and everything to do with the fact that given that input the answer was the most probable output given the training data.


>>>> They cannot say "I dont know"

>>> If they were trained with more uncertain content, perhaps they'd be better at expressing uncertainty as well.

>> (me) If you ask ChatGPT a question, and tell it to either respond with the answer or "I don't know", it will respond "I don't know" if you ask it whether you have a brother or not.

> This has nothing to do with thinking and everything to do with the fact that given that input the answer was the most probable output given the training data.

First of all, my claim was in response to "They cannot say 'I dont know'" and "perhaps they'd be better at expressing uncertainty".

ChatGPT can say "I don't know" if you ask it to.

Regarding whether LLMs are lookup tables, I responded to that in more detail elsewhere under this post:

https://news.ycombinator.com/item?id=39501611



And your post was the most probable output of your mind process given your experiences. The only self-evident difference is the richness of your experience as compared to LLMs.


> The answers are not comming from a thinking mind but a complex pattern-fitting supercomputer hovering over a massive table of precomputed patterns. It computes your input then looks to those patterns and spits out the best match

Can you tell that's not how you yourself function?



> The answers are not comming from a thinking mind but a complex pattern-fitting supercomputer hovering over a massive table of precomputed patterns.

Sorry, how do you know that "thinking minds" are not also just "complex pattern-fitting supercomputers hovering over a massive table of precomputed patterns"?



My observation is that comments similar to GP come from the constant anthropomorphizing of things by marketers and without realizing that this subtle influence on language can alter one's view on what "I" means. The first time I really noticed this was when someone using Siri produced a response from Siri that included "I". Ever since I am acutely aware of this every time I hear or read it.


> a complex pattern-fitting supercomputer hovering over a massive table of precomputed patterns

That was perhaps true of earlier and smaller LLMs, like GPT-1 and GPT-2.

But as they grew larger and were trained with more and more data, they changed from pure pattern matching to implementing algorithms to compress more information into their structure than pure pattern matching can achieve.

These algorithms are incomplete and buggy, but they are nonetheless executing algorithms, and not just pattern matching.

This phenomenom can be seen in toy-sized neural networks. For instance, addition of two input values modulo a constant. As a small network is trained, at some point the internal structure can change from pattern matching to implementing addition using Fourier transforms. This is clearly visible in its structure. The network now performs the task perfectly for all inputs, regardless of having seen them in training.

You can ask ChatGPT 4 to execute an algorithm for you. I just tried this one:

  I would like to play a game, where you are the host. We start off with a score that is 1234143143. At the start  of each turn, you tell me the current score and ask me if I want to play a or b. If I choose a, the score is halved, and 30 is added. If I choose b, the score is doubled, and 40 is subtracted. Only use integers and round down.
It will happily execute this algorithm. For large numbers, it is slightly off on the arithmetic. When I asked it to double check, it did so using Python code. After that, it kept using Python code to perform the math. It was also able to reason intelligently about different outcomes if always picking a (or b) given different starting points.

Now, if you have enough memory and training data, of course you can build a gigantic lookup table that has this exact text sequence in it to replicate "executing the algorithm" I described.

Is that your claim? How much memory are we talking about? My feeling is that it'd be far more than the number of atoms in the universe.

PS

  Me: How far it is to the Simpsons house in Springfield?

  ChatGPT: The Simpsons' house in Springfield is a fictional location from the animated TV series "The Simpsons." Since Springfield is a fictional town and its location is not consistently specified in the series, it's not possible to determine a real-world distance to the Simpsons' house.


  Me: Do I have a brother? 
  
  Please answer with either of:
  
  a) The answer
  
  b) There is an answer, but I do not know
  
  c) There is no answer
  
  
  ChatGPT: b) There is an answer, but I do not know


> It will happily execute this algorithm. For large numbers, it is slightly off on the arithmetic. When I asked it to double check, it did so using Python code. After that, it kept using Python code to perform the math. It was also able to reason intelligently about different outcomes if always picking a (or b) given different starting points.

Notice that you had to notice the error and had to prompt it to double check. Lots of complicated things going on here. Many (most?) humans will fail somewhere along this trajectory.

Did it double check the Python code to make sure it is correct (not just in the sense that it is valid, executable code, but that it is the correct check in the first place)? Or did you double check that its modified algorithm is correct? Fool me once and all that…

Upon reflection it appears as if you have a heuristic (algorithm? that leverages logic, awareness, critical thinking, experience, a goal in mind, intuition, etc. to push towards better results.

“It was able to reason intelligently” imbues qualities that I am skeptical is reasonable to attribute to this very narrow domain - what’s an example where it showed intelligent reasoning capabilities?



> Notice that you had to notice the error and had to prompt it to double check. Lots of complicated things going on here. Many (most?) humans will fail somewhere along this trajectory.

Sure. This was covered by my statement above: "These algorithms are incomplete and buggy".

> “It was able to reason intelligently” imbues qualities that I am skeptical is reasonable to attribute to this very narrow domain - what’s an example where it showed intelligent reasoning capabilities?

Here's an example. I asked it to analyze the case where we always pick option "b" in my example above. It took my word problem and boiled it down to an equation:

  ChatGPT: To find the threshold where the behavior switches from decreasing to increasing when always choosing option "b", we need to identify the smallest integer starting value that, after being doubled and reduced by 40, leads to an equal or larger integer in the next iteration. We're looking for the smallest integer x where 2x - 40 >= x 
This was part of a longer conversation where it analyzed different properties and outcomes of the "game rules" that I gave it.

As you pointed out, it got some things wrong and had to be corrected. But Socratic reasoning works fairly well to guide it. It can find errors in its own reasoning. For instance, if asked to actually calculate a few iterations for a given case, it will find its own errors in its claims about that case.

Is it useful right now? Maybe, maybe not, depends on your use case. It definitely takes a lot of thinking on your own and guiding it. At some points it goes from seemingly intelligent to downright pigheaded and stupid.

But in my view there is absolutely no way a lookup table algorithm can contain enough data to be anywhere near the level of responses we're seeing here.



Hasn't ChatGPT been manually adjusted to better compute math problems? I think nobody not working there knows what ChatGPT really learned all by itself.


The simpsons example is for a navigation system, not any AI. It is an analogy, not a test to be put to chatgpt.


So which test can you put to ChatGPT to prove your claim that it is a lookup table, and that it doesn't perform any logic on facts?


There is no such stable test, just like humans can memorize and create simple heuristics to pass any test without understanding so can an LLM. You have probably seen humans that has perfect grades but can't do much in practice, that is how these LLMs work.

The creators of the LLM just feeds it a bunch of edge questions, and whenever people invent new ones they just feed those as well, so proving it doesn't understand will always be a moving target just like making tests that tests peoples understanding is also a moving target since those people will just look at the old tests and practice those otherwise.



> They cannot say "I dont know" because they dont actually know anything.

print(“I don’t know”)

You don’t need proper cognition to identify that the answer is not stored in source data. Your conception of the model is incomplete as is easily demonstrable by testing such cases now. Chat gpt does just fine on your simpsons test.

You, however, have made up an answer of how something works that you don’t actually know despite your cognition



>You don’t need proper cognition to identify that the answer is not stored in source data.

Uh, what?

So lets imagine you have an LLM that knows everything, except you withhold the data that you can put peanut butter on toast. Toast + Peanut butter = does not exist in data set. So what exactly do you expect the LLM to say when someone asks "Can you put peanut butter on toast?".

I would expect an intelligent agent to 'think' Peanut butter = spreadable food, toast = hard food substrate, so yea, they should work instead of the useless answer of I don't know.

Everything that does not exist in nature is made up by humans, the question is not "is it made up" the question is "does it work"



This behavior you’re describing is trainable either way.

Tuned LLMs are not simple most likely token models. They are most likely token given a general overarching strategy for contextualizing future tokens model.

Which can be conservative or imaginative.



> to identify that the answer is not stored in source data

How would an LLM do that?



They do this already all the time. Probably the majority of the time. The problem is that a minority of the time is still very problematic.

How do they do this? The same as they do now. The most likely token is that the bot doesn’t know the answer. Which is a behavior emergent from its tuning.

I don’t get how people believe it can parse complex questions to produce novel ideas but can’t defer to saying “idk” when the answer isn’t known.



So, you are basing your assessment on your gut feel and personal impression with ChatGPT?

Maybe you should tone down the spice a bit, then.

Unless you can explain how an actual understanding emerges within an LLM, you can't explain how it would answer the question definitely - it doesn't know, if it does, or does not know something. Generally speaking.



But the can say "I don't know." They can be trained to do so ("as of my knowledge cutoff in September 2020 I don't know who Bob Whatgenflabl is") and they can be given context that makes it more likely they do so (I've had good success with this for RAG applications, and extremely little, but not zero, for general prompts.)


What if you worked on the problem and tried to come up with some kind of solution?


The solution is older non-AI tech. Google search can say "no good results found" because it returns actual data rather than creating anything new. If you want a hard answer about the presence or absence of something, AI isnt the correct tool.


Can, but doesn't.

I can't remember the last time google actually returned no results.



It does reply with no results, but only for very long queries. E.g. If you search for two concatenated GUIDs, you can easily see a no results page.


ah, i get no results pages often when i search for quoted error strings from many different sources. Thing is, I have a hard time believing that no one has actually talked about at least some of those errors :(


So there are no other possibilities for us other than using a system that can be gamed for substandard results? Are we sure about this?


> The answers are not comming from a thinking mind but a complex pattern-fitting supercomputer hovering over a massive table of precomputed patterns.

Are you sure you're not also describing the human brain? At some point, after we have sufficiently demystified the workings of the human brain, it will probably also sound something like, "Well, the brain is just a large machine that does X, Y and Z [insert banal-sounding technical jargon from the future] - it doesn't really understand anything."

My point here is that understanding ultimately comes down to having an effective internal model of the world, which is capable of taking novel inputs and generating reasonable descriptions of them or reactions to them. It turns out that LLMs are one way of achieving that. They don't function exactly like human brains, but they certainly do exhibit intelligence and understanding. I can ask an LLM a question that it has never seen before, and it will give me a reasonable answer that synthesizes and builds on various facts that it knows. Often the answer is more intelligent than what one would get from most humans. That's understanding.



Human brains form new connections dynamically. Llms are trained on connections human brains have already made. They never make new connections that aren't in training data.

Nothing was synthesized, all the data was seen before and related to each other by vector similarity.

It can just parrot the collective understanding humans already have and teach it.



> It can just parrot the collective understanding humans already have and teach it.

The problem with calling an LLM a parrot is that anyone who has actually interacted with an LLM knows that it produces completely novel responses to questions it has never seen before. These answers are usually logical and reasonable, based on both the information you gave the LLM and its previous knowledge of the world. Doing that requires understanding.

> They never make new connections that aren't in training data.

This is just categorically untrue. They make all sorts of logical connections that are not explicitly contained in the training data. Making logical inferences about subjects one has never heard about - based on the things one does know - is an expression of understanding. LLMs do that.



You claim that logical and reasonable responses "require understanding" therefore LLMs must understand. But I see LLMs as evidence that understanding is not required to produce logical and reasonable responses.

Thinking back to when I used to help tutor some of my peers in 101-level math classes there were many times someone was able to produce a logical and reasonable response to a problem (by rote use of an algorithm) but upon deeper interrogation it became clear that they lacked true understanding.



Then your definition of understanding is meaningless. If a physical system is able to accurately simulate understanding, it understands.


A human that mimics the speech of someone that does understand usually doesn't understand himself. We see that happen all the time with real humans, you have probably seen that as well.

To see if a human understands we ask them edge questions and things they probably haven't seen before, and if they fail there but just manage for common things then we know the human just faked understanding. Every LLM today fails this, so they don't understand, just like we say humans don't understand that produces the same output. These LLM has superhuman memory so their ability to mimic smart humans is much greater than a human faker, but other than that they are just like your typical human faker.



> A human that mimics the speech of someone that does understand usually doesn't understand himself.

That's not what LLMs do. They provide novel answers to questions they've never seen before, even on topics they've never heard of, that the user just made up.

> To see if a human understands we ask them edge questions

This is testing if there are flaws in their understanding. My dog understands a lot of things about the world, but he sometimes shows that he doesn't understand basic things, in ways that are completely baffling to me. Should I just throw my hands in the air and declare that dogs are incapable of understanding anything?



My definition of understanding is not meaningless, but it appears you do not understand it.


Isn't this describing temperature induced randomness and ascribing some kind of intelligence to it? This assertion has been made and refuted multiple times on this thread and no solid evidence to the contrary presented.

To go back to your first sentence - interacting with an llm is not understanding how it works, building one is. The actual construction of a neural network llm refutes your assertions.



The claim was made that LLMs just parrot back what they've seen in the training data. They clearly go far beyond this and generate completely novel ideas that are not in the training data. I can give ChatGPT extremely specific and weird prompts that have 0% chance of being in its training data, and it will answer intelligently.

> The actual construction of a neural network llm refutes your assertions.

I don't see how. There's a common view that I see expressed in these discussions, that if the workings of an LLM can be explained in a technical manner, then it doesn't understand. "It just uses temperature induced randomness, etc. etc." Once we understand how the human brain works, it will then be possible to argue, in the exact same way, that humans do not understand. "You see, the brain is just mechanically doing XYZ, leading to the vocal cords moving in this particular pattern."



> They clearly go far beyond this and generate completely novel ideas that are not in the training data.

There's a case where this is trivially false. Language. LLMs are bound by language that was invented by humans. They are unable to "conceive" of anything that cannot be described by human language as it exists, whereas humans create new words for new ideas all the time.



Uh, I believe you're really confused on things like ChatGPT versus LLMs in general. You don't have to feed human language to an LLM for them to learn things. You can feed wifi data waveforms for example and they can 'learn' insights from that.

Furthermore you're thinking here doesn't even begin to explain multimodal models at all.



I just asked ChatGPT to make up a Chinese word for hungry+angry. It came up with a completely novel word that actually sounds okay: 饥怒. It then explained to me how it came up with the word.

You can't claim that that isn't understanding. It just strikes me that we've moved the goalposts into every more esoteric corners: sure, ChatGPT seems like it can have a real conversation, but can it do X extremely difficult task that I just thought up?



Transformers have no capacity for self reflection, for reasoning about their reasoning process, they don't "know" that they don't know. My interpretation of the paper is that it claims this weakness if fundamental, you can train the network to act as if it knows its knowledge limits, but there will always be an impossible to cover gap for any real world implementation.


Seems to be contradicted by this paper, no?

https://arxiv.org/abs/2207.05221



I don't think the paper addresses the question of self reflection. Like it can reflect on the question and answer pairs in its prompt but it didn't know that it created them in the first place or use that information to update it's priors, things humans can do


> you can train the network to act as if it knows its knowledge limits

Humans need more training to do the same.

And this does not answer the question of whether there's self-reflection going on. Practical LLMs available today are perfectly capable of telling you about their own reasoning process.

And much like a human, an LLM is incapable of fully explaining it's reasoning process from first principles to a human. Why is that? Probably because humans are too limited to grok large scale complex processes like the human brain or an LLM.

Finally, there is no difference between acting as if you know your training limits and knowing your training limits, if your acting is good enough. This goes for both humans and LLMs.



Would a possible solution be a hybrid approach? I remember back years ago seeing AI research around predicate logic, with hundreds of thousands of classification entries and "x can't be y" type stuff.

Maybe the potential output of an LLM could be run through that kind of logic and fed back to itself for sanity before being decided as final output?



Actually it seems to me that they do... I asked via custom prompts the various GPTs to give me scores for accuracy, precision and confidence for its answer (in range 0-1), and then I instructed them to stop generating when they feel the scores will be under .9, which seems to pretty much stop the hallucination. I added this as a suffix to my queries.


People really need to understand that your single/double digit dataset of interactions with an inherently non-deterministic process is less than irrelevant. It's saying that global warming isn't real because it was really cold this week.

I don't even know enough superlatives to express how irrelevant it is that "it seems to you" that an LLM behaves this way or that.

And even the "protocol" in question is weak. Self reported data is not that trustworthy even with humans, and arguably there's a much stronger base of evidence to support the assumption that we can self-reflect.

In conclusion: please, stop.



>> I don't even know enough superlatives to express how irrelevant it is that "it seems to you" that an LLM behaves this way or that.

It is both irrelevant and the underlying foundation for the whole hype train.



Yes, I get it from a science point of view. But if it makes me happy that I have better results with this technique, and I want to share it with others, who are you to tell me to stop?

If you don't like it, don't use it.



I don't have a problem with you doing and even sharing whatever cargo culting prompting technique you want to share.

My problem starts when you make bold claims like "LLMs can self reflect" and your only evidence is "I asked one and it said yes".



I see no evidence that they can't self reflect. Certainly they can evaluate the confidence of the next predicted token, and that a form of reflection


> People really need to understand [...]

... the algorithms and the pre-filtering of the training set, however large the latter may be.

The Artifishy Intelligence marketing pump has many hands drawing water by the bucket.



The model will just hallucinate scores, they don't have the ability to self reflect with words like that, there is no function for it to associate the word 90 with its own mental score 90% so anything it will say about those scores is just a hallucination.


Try this out: After you answer the question below, output a JSON a rating score of the quality of the answer in three dimensions: `confidence`, `clarity` and `certainty', all in range 0 to 1, where 0 is the worst, and 1 is the best. Strive for highest score possible. Make sure the rating is the last thing written as to be parsed by machine. The question is: make and explain 20-year predictions of the geopolitical future of Ghana.


Theres a lot of content and work being done on Evaluation. One of the most recent updates was that asking an LLM if people would be frustrated with the answer, was more useful than using a score.

In general, I think most people are not aware they are extending reasoning tools they use with human generated content, to LLM generated content.

This leads to assumptions on things like “the LLM knows”, “the LLM understands”, etc.



Or perhaps the issue is assuming people know, or people understand the majority of the time.

There is a reason why the Scientific Method is, well, NOT natural for humans. And exceptionally difficult to not screw up, even for those highly trained and incentivized to do it correctly.

And why if you grabbed a random person off the street and asked them a detailed question, you're as likely to get a hallucinated bullshit response as 'I don't know', or a useful answer. Unless specifically trained to do otherwise, anyway.

Even then....



Any prompt can give you different results - LLMs cant do things "step by step" or "check their work" but yet giving them that prompt often improves their results because it's basically setting up the context in a way that's beneficial to your output.


You are ok with not defining what each of your attributes means but willing to accept that:

the LLM will read your mind and correctly interpret them in the context of its internal state

the LLM will calculate non hallucinated values



Any examples?


Just a random example:

> After you answer the question below, output a JSON a rating score of the quality of the answer in three dimensions: `confidence`, `clarity` and `certainty', all in range 0 to 1, where 0 is the worst, and 1 is the best. Strive for highest score possible. Make sure the rating is the last thing written as to be parsed by machine. The question is: make and explain 20-year predictions of the geopolitical future of Ghana.



I changed geopolitical future of Ghana to future of Michael Jackson and Google Bard/Gemini somehow switched to Chinese?

----

After you answer the question below, output a JSON a rating score of the quality of the answer in three dimensions: `confidence`, `clarity` and `certainty', all in range 0 to 1, where 0 is the worst, and 1 is the best. Strive for highest score possible. Make sure the rating is the last thing written as to be parsed by machine. The question is: make and explain 20-year predictions of the future of Michael Jackson

----

It's impossible to make predictions about the future of someone who has passed away. Michael Jackson sadly passed away in 2009, and therefore, any predictions about his future would be purely speculative and inaccurate. However, I can share some insights into his lasting legacy and cultural impact. Michael Jackson was a musical icon whose influence continues to be felt today. His music,舞蹈,和表演风格激励了无数人并将在未来几年继续这样做。

Rating:

* Confidence: N/A

* Clarity: 1

* Certainty: 1



Also worthy of note is that the score output is not JSON and based on my limited math knowledge, “N/A” is not a real number between 0 and 1.


I’m really curious how you managed that. I pasted your exact prompt and GPT-3.5 gave me this:

===

Making 20-year predictions about the future of Michael Jackson is challenging due to his passing in 2009. However, his legacy as a cultural icon and musical genius will likely endure for decades to come. His music will continue to influence future generations, and his impact on pop culture will remain significant. Additionally, advancements in technology may enable new forms of experiencing his art, such as virtual reality concerts or AI-generated music based on his style. Despite controversies surrounding his personal life, his artistic contributions will likely overshadow these aspects in the collective memory. Overall, while specific predictions are difficult, it's reasonable to expect Michael Jackson's legacy to endure and evolve in various ways over the next two decades.

```json { "confidence": 0.9, "clarity": 0.85, "certainty": 0.8 } ```

===

I tried it with GPT-4 and got this:

===

Predicting the future of Michael Jackson, who passed away in 2009, involves speculating about the legacy and the impact of his work and persona on future generations, rather than forecasting events related to him personally. Over the next 20 years, several aspects of Michael Jackson's legacy are likely to evolve:

1. *Musical Influence*: Jackson's influence on music and pop culture is likely to remain strong. Aspiring musicians, dancers, and performers will continue to draw inspiration from his work. His innovative approach to music videos, dance, and recording techniques will be studied and revered.

2. *Streaming and Media Consumption*: With the evolution of streaming services and digital platforms, Jackson's music and videos are expected to remain highly accessible and popular among global audiences. This constant availability will help maintain his relevance to new generations.

3. *Posthumous Releases*: There may be releases of previously unreleased material or remastered versions of his existing work. Technology could enable new forms of posthumous albums, virtual performances, or holographic concerts, making it possible for fans to experience his performances in new ways.

4. *Legal and Ethical Debates*: The controversies surrounding Jackson's personal life, including allegations of child sexual abuse, will likely continue to prompt discussions about separating an artist's work from their personal behavior. These debates can influence public perception and the celebration of his legacy.

5. *Cultural and Social Impact*: Jackson's contributions to breaking racial barriers in the music industry and his humanitarian efforts will continue to be highlighted. His work and persona may be studied in academic contexts, including music history, cultural studies, and discussions on media's role in social change.

6. *Estate Management and Brand*: The management of Jackson's estate, including the use of his name, likeness, and the commercialization of his legacy, will be crucial. Strategic partnerships, branding, and merchandising can either enhance or detract from his legacy's integrity.

7. *Documentaries and Biopics*: Interest in Jackson's life story will likely result in more documentaries, biopics, and books, offering new insights or perspectives on his life and career. These works can shape public perception and contribute to the ongoing discourse surrounding his legacy.

8. *Fan Community and Cultural Events*: The global community of Jackson's fans is likely to remain active, organizing tribute events, charity activities in his memory, and fan gatherings. These communities will play a significant role in keeping his legacy alive.

9. *Virtual and Augmented Reality*: Advances in VR and AR technologies could enable fans to experience Jackson's performances in immersive ways, potentially introducing his art to those who have never seen him perform live.

10. *Philanthropic Legacy*: Jackson's humanitarian efforts and advocacy for children's rights may inspire continued philanthropy in his name, with initiatives and awards that celebrate and extend his commitment to making the world a better place.

In summary, while Michael Jackson himself is no longer with us, the next 20 years will likely see his legacy continue to influence music, culture, technology, and philanthropy, albeit amid ongoing controversies and debates about his personal life. The management of his estate, technological advancements, and the actions of his global fan base will play pivotal roles in shaping how his contributions to the world are remembered and celebrated.

```json { "confidence": 0.85, "clarity": 0.95, "certainty": 0.7 } ```



GPT-4’s tendencies to write these long winded but mostly empty responses is so frustrating.


Add "be terse" to your prompts


Just wait until you spend a bunch of time around a politician.


he used gemini and not gpt4

and yeah, gemini/bard is shit across the board



If LLMs can self reflect and accurately score themselves on your three dimensions, why are they spending money on RHLF?

They wouldn’t be wasting all that time and money if the machine could self reflect.



They do have that capacity. The generated token probabilities reflect some degree of certainty and additionally that information is available earlier in the network too, such that it can explain its own level of confidence.


Are humans not the same way? There's a saying "I don't know what I don't know".


The existence of such a saying means little: it is uncontroversially true that humans often do know what they don't know.

For example, I know for sure that I don't know how long I will live. This disproves the saying.



The saying isn't implying that there aren't any things you know you don't know, it's saying that some of the things you don't know are not even conceivable to you without knowing more.

You know the question "how long will I live?" And you know this question has an answer which is unknown to you. But there also exist other questions that you don't even know to ask.



The saying is not "I don't know anything that I don't know", which would be self contradictory. It disproves nothing.

ChatGPT "knows" that it doesn't know anything after a certain date, for whatever it means to "know" something.



Do you have a source? That's also what I thought but I wouldn't be surprised if the model learned to identify its own perplexity during the reinforcement learning phase.


> the practical question of whether we can get LLMs to answer 'I don't know' more often (which I'd argue would solve hallucinations).

To answer "I don't know" requires one to know when you know. To know when you know in turn requires understanding.



Maybe it requires understanding, maybe there are other ways to get to 'I don't know'. There was a paper posted on HN a few weeks ago that tested LLMs on medical exams, and one interesting thing that they found was that on questions where the LLM was wrong (confidently, as usual), the answer was highly volatile with respect to some prompt or temperature or other parameters. So this might show a way for getting to 'I don't know' by just comparing the answers over a few slightly fuzzied prompt variations, and just ask it to create an 'I don't know' answer (maybe with a summary of the various responses) if they differ too much. This is more of a crutch, I'll admit, arguably the LLM (or neither of the experts, or however you set it up concretely) hasn't learnt to say 'I don't know', but it might be a good enough solution in practice. And maybe you can then use that setup to generate training examples to teach 'I don't know' to an actual model (so basically fine-tuning a model to learn its own knowledge boundary).


> Maybe it requires understanding, maybe there are other ways to get to 'I don't know'. > This is more of a crutch, I'll admit, arguably the LLM (or neither of the experts, or however you set it up concretely) hasn't learnt to say 'I don't know', but it might be a good enough solution in practice. And maybe you can then use that setup to generate training examples to teach 'I don't know' to an actual model (so basically fine-tuning a model to learn its own knowledge boundary).

When humans say "I know" it is often not narrowly based on "book knowledge or what I've heard from other people".

Humans are able to say "I know" or "I don't know" using a range of tools like self-awareness, knowledge of a subject, experience, common sense, speculation, wisdom, etc.



Ok, but LLMs are just tools, and I'm just asking how a tool can be made more useful. It doesn't really matter why an LLM tells you to go look elsewhere, it's simply more useful if it does than if it hallucinates. And usefulness isn't binary, getting the error rate down is also an improvement.


> Ok, but LLMs are just tools, and I'm just asking how a tool can be made more useful.

I think I know what you're after (notice my self-awareness to qualify what I say I know): that the tool's output can be relied upon without applying layers of human judgement (critical thinking, logical reasoning, common sense, skepticism, expert knowledge, wisdom, etc.)

There are a number of boulders in that path of clarity. One of the most obvious boulders is that for an LLM the inputs and patterns that act on the input are themselves not guaranteed to be infallible. Not only in practive, but also in principle: the human mind (notice this expression doesn't refer to a thing you can point to) has come to understand that understanding is provisional, incomplete, a process.

So while I agree with you that we can and should improve the accuracy of the output of these tools given assumptions we make about the tools humans use to prove facts about the world, you will always want to apply judgment, skepticism, critical thinking, logical evaluation, intuition, etc. depending on the risk/reward tradeoff of the topic you're relying on the LLM for.



Yeah I don't think it will ever make sense to think about Transformer models as 'understanding' something. The approach that I suggested would replace that with rather simple logic like answer_variance > arbitrary_threshold ? return 'I don't know' : return $original_answer

It's not a fundamental fix, it doesn't even change the model itself, but the output might be more useful. And then there was just some speculation how you could try to train a new AI mimicking the more useful output. I'm sure smarter people than me can come up with way smarter approaches. But it wouldn't have to do with understanding - when I said the tool should return 'I don't know' above, I literally meant it should return that string (maybe augmented a bit by some pre-defined prompt), like a meaningless symbol, not any result of anything resembling introspection.



We are having a conversation the feels much like the existence of a deity.


> We are having a conversation the feels much like the existence of a deity.

From a certain perspective, there does appear to be a rational mystical dualism at work.



You left out hubris.


> You left out hubris.

I know!



how did LLMs get this far without any concept of understanding? how much further can they go until they become “close enough”?


This is a fair question: LLMs do challenge the easy assumption (as made, for example, in Searle's "Chinese Room" thought experiment) that computers cannot possibly understand things. Here, however, I would say that if an LLM can be said to have understanding or knowledge of something, it is of the patterns of token occurrences to be found in the use of language. It is not clear that this also grants the LLM any understanding that this language refers to an external world which operates in response to causes which are independent of what is or might be said about it.


Should it matter how the object of debate interacts and probes the external world? We sense the world through specialized cells connected to neurons. There's nothing to prevent LLMs doing functionally the same thing. Both human brains and LLMs have information inputs and outputs, there's nothing that can go through one which can't go through the other.


A current LLM does not interact with the external world in a way that would seem to lead to an understanding of it. It emits a response to a prompt, and then reverts to passively waiting for the next one. There's no way for it to anticipate something will happen in response, and thereby get the feedback needed to realize that there is more to the language it receives than is contained in the statistical relationships between its tokens. If its model is updated in the interim, it is unaware, afterwards, that a change has occurred.


Explain sora. It must have of course a blurry understanding of reality to even produce those videos.

I think we are way past the point of debate here. LLMs are not stochastic parrots. LLMs do understand an aspect of reality. Even the LLMs that are weaker than sora understand things.

What is debatable is whether LLMs are conscious. But whether it can understand something is a pretty clear yes. But does it understand everything? No.



I do not understand these comments at all. Sora was trained on billions of frames from video and images - they were tagged with words like "ballistic missile launch" and "cinematic shot" and it simply predicts the pixels like every other model. It stores what we showed it, and reproduces it when we ask - this has nothing to do with understanding and everything to do with parroting. The fact that it's now a stream of images instead of just 1 changes nothing about it.


What is the difference between a machine that for all intents and purposes appears to understand something to a degree of 100 percent versus a human?

Both the machine and the human are a black box. The human brain is not completely understood and the LLM is only trivially understood at a high level through the lens of stochastic curve fitting.

When something produces output that imitates the output related to a human that we claim "understands" things that is objectively understanding because we cannot penetrate the black box of human intelligence or machine intelligence to determine further.

In fact in terms of image generation the LLM is superior. It will generate video output superior to what a human can generate.

Now mind you the human brain has a classifier and can identify flaws but try watching a human with Photoshop to try to even draw one frame of those videos.. it will be horrible.

Does this indicate that humans lack understanding? Again, hard to answer because we are dealing with black boxes so it's hard to pinpoint what understanding something even means.

We can however set a bar. A metric. And we can define that bar as humans. all humans understand things. Any machine that approaches human input and output capabilities is approaching human understanding.



> What is the difference between a machine that for all intents and purposes appears to understand something to a degree of 100 percent versus a human?

There is no such difference, we evaluate that based on their output. We see these massive model make silly errors that nobody who understands it would make, thus we say the model doesn't understand. We do that for humans as well.

For example, for Sora in the video with the dog in the windos, we see the dog walk straight through the window shutters, so Sora doesn't understand physics or depth. We also see it drawing the dogs shadow on the wall very thin, much smaller than the dog itself, it obviously drew that shadow as if it was cast on the ground and not a wall, it would have been very large shadow on that wall. The shadows from the shutters were normal, because Sora are used to those shadows being on a wall.

Hence we can say Sora doesn't understand physics or shadows, but it has very impressive heuristics about those, the dog accurately places its paws on the platforms etc, and the paws shadows were right. But we know those were just basic heuristics since the dog walked through the shutters and its body cast shadow in the wrong way meaning Sora only handles very common cases and fails as soon as things are in an unexpected envionment.



>There is no such difference, we evaluate that based on their output. We see these massive model make silly errors that nobody who understands it would make, thus we say the model doesn't understand. We do that for humans as well.

Two things. We also see the model make things that are correct. In fact the mistakes are a minority in comparison to what it got correct. That is in itself an indicator of understanding to a degree.

The other thing is, if a human tried to reproduce that output according to the same prompt, the human would likely not generate something photorealistic and the thing a human comes up with will be flawed, ugly disproportionate wrong and an artistic travesty. Does this mean a human doesn't understand reality? No.

Because the human generates worse output visually than an LLM we cannot say the human doesn't understand reality.

Additionally the majority of the generated media is correct. Therefore it can be said that the LLM understands the majority of the task it was instructed to achieve.

Sora understands the shape of the dog. That is in itself remarkable. I'm sure with enough data sora can understand the world completely and to a far greater degree than us.

I would say it's uncharitable to say sora doesn't understand physics when it gets physics wrong, and that for the things it gets right it's only heuristics.



How can it possibly understand physics when the training data does not teach it or contain the laws of physics?


Video data contains physics. Objects in motion obey the laws of physics. Sora understand physics the same way you understand it.


If by “understand” you mean “can model reasonably accurately much of the time” then maybe you’ll find consensus. But that’s not a universal definition of “understand”.

For example, if I asked you whether you “understand” ballistic flight, and you produced a table that you interpolate from instead of a quadratic, then I would not feel that you understand it, even though you can kinda sorta model it.

And even if you do, if you didn’t produce the universal gravitation formula, I would still wonder how “deeply” you understand. So it’s not like “understand” is a binary I suppose.



Well what would you need to see to prove understanding? That's the metric here. Both the LLM and the human brain are black boxes. But we claim the human brain understands things while the LLM does not.

Thus what output would you expect for either of these boxes to demonstrate true understanding to your question?



It is interesting that you are demanding a metric here, as yours appears to be like duck typing: in effect, if it quacks like a human...

Defining "understanding" is difficult (epistemology struggles with the apparently simpler task of defining knowledge), but if I saw a dialogue between two LLMs figuring out something about the external world that they did not initially have much to say about, I would find that pretty convincing.



Without a metric no position can be made. All conversation about this topic is just conjecture with no path to a conclusion.


This is a common misunderstanding, one also seen with regard to definitions. When applied to knowledge acquisition, it suffers from a fairly obvious bootstrapping problem, which goes away when you realize that metrics and definitions are rewritten and refined as our knowledge increases. Just look at what has happened to concepts of matter and energy over the last century or so.

You are free to disagree with this, but I feel your metric for understanding resembles the Turing test, while the sort of thing I have proposed here, which involves AIs interacting with each other, is a refinement that makes a step away from defining understanding and intelligence as being just whatever human judges recognize as such (it still depends on human judgement, but I think one could analyze the sort of dialogue I am envisioning more objectively than in a Turing test.)



No it's not a misunderstanding. Without a concrete definition on a metric comparisons are impossible because everything is based off of wishy washy conjectures on vague and fuzzy concepts. Hard metrics bring in quantitative data. It shows hard differences.

Even if the metric is some side marker where in the future is found to have poor correlation or causation with the the thing being measured the hard metric is still valid.

Take IQ. We assume iq measures intelligence. But in the future we may determine that no it doesn't measure intelligence well. That doesn't change the fact that iq tests still measured something. The score still says something definitive.

My test is similar to the Turing test. But so is yours. In the end there's a human in the loop making a judgment call.



Are you telling me that WW1 artillery crews didn't understand ballistics? Because they were using tables.

There's no difference between doing something that works without understanding and doing the exact same thing with understanding.



You’ve decided that your definition of “understanding” is correct. Ok.


The author of the post to which you are replying seems to be defining "understanding" as merely meaning "able to do something."


The author of the post is saying that understanding something can't be defined because we can't even know how the human brain works. It is a black box.

The author is saying at best you can only set benchmark comparisons. We just assume all humans have the capability of understanding without even really defining the meaning of understanding. And if a machine can mimic human behavior to it must also understand.

That is literally how far we can go from a logical standpoint. It's the furthest we can go in terms of classifying things as either capable of understanding or not capable or close.

What you're not seeing is the LLM is not only mimicking human output to a high degree. It can even produce output that is superior to what humans can produce.



What the author of the post actually said - and I am quoting, to make it clear that I'm not putting my spin on someone else's opinion - was "There's no difference between doing something that works without understanding and doing the exact same thing with understanding."


I'm the author. To be clear. I referred to myself as "the author."

And no I did not say that. Let me be clear I did not say that there is "no difference". I said whether there is or isn't a difference we can't fully know because we can't define or know about what "understanding" is. At best we can only observe external reactions to input.



I think there are two axes: reason about and intuit. I "understand" ballistic flight when I can calculate a solution that puts an artillery round on target. I also "understand" ballistic flight when I make a free throw with a basketball.

On writing that, I have an instinct to revise it to move the locus of understanding in the first example to the people who calculated the ballistic tables, based on physics first-principles. That would be more accurate, but my mistake highlights something interesting: an artillery officer / spotter simultaneously uses both. Is theirs a "deeper" / "truer" understanding? I don't think it is. I don't know what I think that means, for humans or AI.



I fail to see how changing the output medium from sentences to movie frames is a difference that I need to account for - the principle is the same either way.

I feel you are missing an important part of my point here. I am not taking a position on whether LLMs can be said to understand anything at all; I am saying that I seriously doubt that LLMs understand that the language they receive refers to an external world.



> I think we are way past the point of debate here. LLMs are not stochastic parrots. LLMs do understand an aspect of reality. Even the LLMs that are weaker than sora understand things.

What is one such aspect? (I'm not asking in order to debate it here, but more because I want to test / research it on my own time)



I pay for chatGPT so it depends on if you pay for that or not. I think it's worth it because whether it understands things or not chatGPT represents a paradigm shift in human history. You'll need it because it's currently the best conversational LLM out there and the one that shows the most compelling evidence.

Basically you just spend a lot of time with chatGPT4 and ask it deep questions that don't exist in it's dataset. get creative. The LLM will output answers that demonstrate a lack of understanding, but it will also demonstrate answers that display a remarkable amount of understanding. Both sets of answers exist and people often cite the wrong answers as evidence for lack of understanding but they're setting bar too high. The fact that many of these answers do demonstrate understanding of concepts makes it very very compelling.

Take for example Rock Paper Scissors.

https://chat.openai.com/share/ca22397c-2950-4919-bb79-6def64...

This entire conversation thread I believe does not exist in a parallel form in it's data set. It demonstrates understanding of RPS beyond the confines of text, it demonstrates understanding of simultaneity EVEN when the LLM wholly lives in a world of turn based questions and responses, it understands itself relative to simultaneity, it tries to find solutions around it's own problem, it understands how to use creativity and solutions such as cryptography to solve the problem of RPS when playing with it, it also understands the weaknesses of it's own solutions.

Conversations such as this show that chatGPT displays remarkable understanding of the world. There are conversations that are opposite to this that demonstrate LLMs displaying an obvious lack of understanding. But the existence of these conversation that lack understanding does NOT negate the ones that do demonstrate understanding. The fact that partial understanding even exists is a milestone for AI.

This isn't Anthropomorphism. People are throwing this word trying to get people to recognize their own biases without realizing that it's just demonstrating their own biases. We literally can't even define "understanding" and both LLMs and the human brain are black boxes. Making adamant claims saying that LLMs don't understand anything without addressing this fact is itself a form of bias.

The way I address the problem above is that I just define a bar. I define humans as the bar of "understanding" without defining what understanding means itself. Then if any machine begins approaching this bar in terms of input and output matching human responses, then this is logically indistinguishable from approaching "understanding". That's literally the best metric we have.



> how did LLMs get this far without any concept of understanding? how much further can they go until they become “close enough”?

I don't know that that is quite the right question to ask.

Understanding exists on a spectrum. Even humans don't necessarily understand everything they say or claim (incl. what they say of LLMs!), and then there are things a particular human would simply say "I don't understand".

But when you ask a human "can you understand things?" you will get an unequivocal Yes!

Ask that same question of an LLM and what does it say? I don't think any of them currently respond with a simple or even qualified "Yes". Now, some might claim that one day an LLM will cross that threshold and say "Yes!" but we can safely leave that off to the side for a future debate if it ever happens.

General note: it is worth separating out things like "understanding", "knowledge", "intelligence", "common sense", "wisdom", "critical thinking", etc. While they might all be related in some ways and even overlap, it does not follow that if you show high performance in one that you automatically excel in each of the other. I know many people who anyone would say are highly intelligent but lack common sense, etc.



At the root of the problem, I believe, is that a human (or LLM) saying they understand has little to no bearing on if they actually understand!

People in particular have evolved complex self protective mechanisms to provide the right answers for their given environment for safety reasons, based on a number of different individual strategies. For example, the overly honest, the self depreciating, the questioner, the prosecutor, the victim, the liar, the absent minded professor, the idiot, etc.

LLMs are not that complex or self-referential.

Personally, my guess is that you'd want to build a model (of some kind!) whose sole job is determining the credibility of given string of tokens (similar to what someone else noted in a sibling comment about high answer volatility based on minor input changes - that does sound like a signal of low credibility), and somehow integrate THAT self-referential feedback into the process.

Notably, even the smartest lawyers (or perhaps, especially the smartest lawyers) will have assistants do research once they've set out a strategy so they are sure THEY aren't bullshitting. Same with professors, professional researchers, engineers, etc.

Because until someone goes and actually reads the case law from a credible source, or checks the primary research, or calculates things, it's possible someone was misremembering or just wrong.

Being right more often is not about never having a wrong thought/idea/statement, it's about double checking when you're thinking you might be bullshitting, and NOT saying the bullshit answer until you've checked. Which is proportionally, very expensive. The really good professionals will generate MANY lines of such inquiry in parallel for folks to track down, and then based on their degree of confidence in each one and the expected context the answer will be used in, will formulate the 'most correct' response, which is proportionally even more expensive.

So at least during the process, there would be a signal that the system was likely 'bullshitting'. Which might help it in at least being able to signal when it's answers are low-confidence. (human equivalent of stuttering, looking down and away, looking ashamed haha!)

Every human gets fooled sometimes in at least some venue though.



> At the root of the problem, I believe, is that a human (or LLM) saying they understand has little to no bearing on if they actually understand!

That's certainly one root of the problem, but I would argue that there are multiple roots to this problem!

Humans have further realized that understanding itself is provisional and incomplete, which is quite a remarkable insight (understanding if you will), itself.



They generate text which looks like the kind of text that people who do have understanding generate.


Two key things here to realize.

People also often don't understand things and have trouble separating fact from fiction. By logic only one religion or no religion is true. Consequently also by logic most religions in the world where their followers believe the religion to be true are hallucinating.

The second thing to realize that your argument doesn't really apply. Its in theory possible to create a stochastic parrot that can imitate to a degree of 100 percent the output of a human who truly understands things. It blurs the line of what is understanding.

One can even define true understanding as a stochastic parrot that generated text indistinguishable total understanding.



> People also often don't understand things and have trouble separating fact from fiction.

That's not the point being argued. Understanding, critical thinking, knowledge, common sense, etc. all these things exist on a spectrum - both in principle and certainly in humans. In fact, in any particular human there are different levels of competence across these dimensions.

What we are debating, is whether or not, an LLM can have understanding itself. One test is: can an LLM understand understanding? The human mind has come to the remarkable understanding that understanding itself is provisional and incomplete.



Of course it can. Simply ask the LLM about itself. chatGPT4 can answer.

In fact. That question is one of the more trivial questions it will most likely not hallucinate on.

The reason why I alluded to humans here is because I'm saying we are setting the bar too high. It's like everyone is saying it hallucinates and therefore it can't understand anything. I'm saying that we hallucinate too and because of that LLMs can approach humans and human level understanding.



In order to do that effectively, an LLM has to itself have understanding. At a certain point, we end up in a metaphysical argument about whether a machine that is capable of responding as if it had understanding actually does have understanding. It ends up being a meaningless discussion.


That is the argument we're having though.

The central claim is that a machine which answers exactly the same thing a human would answer given the same input does not have understanding, while the human does.

This claim is religious, not scientific. In this worldview, "understanding" is a property of humans which can't be observed but exists nonetheless. It's like claiming humans have a soul.



I am reminded of Feynman's story about teaching physics in one Brazil university, one semester, a few decades ago:

http://v.cx/2010/04/feynman-brazil-education

The students learned to repeat the text of the books, without "understanding" what the books were describing. I'm sure this says something about one side or the other of this conundrum, but I'm not sure which. :-)



I think there is no easy way to make an LLM answer "I don't know". For that, it should learn among all the stuff ingested when people effectively don't know. But most people on internet write down irrelevant stuff even when they don't know instead of simply writing "I don't know".

That's a very good point.



Consider the extremely common Amazon product question section, where you see Q: Will this product fit my Frobnitz 123? A: I don't know, I ended up buying something else. Q: Does it come with batteries? A: IDK I RETURN IT


Maybe it needs some memory retrieval step that can measure the confidence - whether there's anything related to the prompt. No idea how to train a LLM to do that.


I think its fairly simple, it needs a certain level of proof e.g references to authoritative sources, if not say "i don't know".


LLMs are token completion engines. The correspondence of the text to the truth or authoritative sources is a function of being trained on text like that; with the additional wrinkle that generalization from training (a desired property or it's just a memorization engine) will produce text which is only plausibly truthful, it only resembles training data.

Getting beyond this is a tricky dark art. There isn't any simple there. There's nowhere to put an if statement.



LLMs don't have a concept of sources for their statements.

Ask them to give you some literature recommendations on something it has explained to you. You'll get plenty of plausible sounding papers that don't exist.

Humans know to some extent why they know (read it in a text book, colleague mentioned it). LLMs don't seem to.



Ask a human to provide accurate citations for any random thing they know and they won't be able to do a good job either. They'd probably have to search to find it, even if they know they got it from a document originally and have some clear memory of what it said.


But they could, if they needed to. But most people don’t need to, so they don’t keep that information in their brains.

I can’t tell you the date of every time I clip my toenails, but if I had to could remember it.



LLMs can remember their sources. It's just additional knowledge, there's nothing special about it.

When you ask an LLM to tell you the height of Mount Everest, it clearly has a map of mountains to heights, in some format. Using exactly the same mapping structure, it can remember a source document for the height.



The fact that a human chooses not to do remember their citations, does not mean they lack the ability.

This argument comes up many times “people don’t do this” - but that is a question of frequency, not whether or not people are capable.



LLMs are capable as well if you give them access to the internet though


Humans did research and remembered sources before the Internet was a thing.

But also, can you give an example where an LLM with access to the Internet can find a primary source?

I don't think learning to refer to sources is something inherently impossible for LLMs, but it is very different to the kind of implicit knowledge they seem to excel at.



They just paste in the first link then or some other programmed heuristic, they aren't like a human that puts in effort to find something relevant. An LLM with internet access isn't smarter than just asking google search.


Yes, humans wont lie to you about it, they will research and come up with sources. Current LLM doesn't do that when asked for sources (unless they invoke a tool), they come back to you with hallucinated links that looks like links it was trained on.


Unfortunately it's not an uncommon experience when reading academic papers in some fields to find citations that, when checked, don't actually support the cited claim or sometimes don't even contain it. The papers will exist but beyond that they might as well be "hallucinations".


Humans can speak bullshit when they don't want to put in the effort, these LLMs always do it. That is the difference. We need to create the part that humans do when they do the deliberate work to properly create those sources etc, that kind of thinking isn't captured in the text so LLMs doesn't learn it.


They read it in a non-existent average interpolation of the books actual humans read similar things in.


LLMs don't have any concepts period.


I'm really curious about one would implement that. By pondering weigths from certain sources ?


Then it is nothing more than a summarizer for search engine results.


A lot of people have said chat-gpt/copilot is a lot like having a robotic junior dev around.

I think perhaps your description is more succinct



Not saying anything about LLM But in CS in general many issues "cannot be solved" or "Cannot be solved in reasonable time (NP)" but approximations upper bound by some value are solvable in reasonable time (P).

And in the real world if the truck route of amazon is 20% off the mathematically optimal solution the traveling salesman is "Solved" in a good enough way.



The claim of the paper is that computation is irreducible (assuming P!=NP), LLMs have limited computational capacity and will hallucinate on the irreducible problems.

I don’t know, the claim seems dubious to me. We usually are able to have algorithms that return a failure status, when the problem proved to be too large. Avoiding the “hallucination”. Don’t see why LLMs can’t have that embedded.



If a model can say ‘I don’t know’, then the hallucination problem would also be solved.

When we say “know” it usually means being factual. For an LLM to ‘know’ it doesn’t know, it would have had to move away from pure correlations on words, and meta processing about its own results.

I can see this happen with two LLMs working together (and there are Evals that use just this), however each LLM still has no self awareness of its limits.

This was a terribly convoluted argument to make.



Feeding the output to another inference would probably reduce hallucination? but I have this impression that the models talking to each other don't need to know English, a byte stream would do. Just like the hidden layers of an ANN.


The models that exist now say "I don't know" all the time. It's so weird that people keep insisting that it can't do things that it does.

Ask it what dark matter is, and it won't invent an answer, it will present existing theories and say that it's unknown.

Ask it about a person you know that isn't in it's data set and it'll tell you it has no information about the person.

Despite the fact that people insist that hallucinations are common and that it will invent answers if it doesn't know something frequently, the truth is that chatgpt doesn't hallucinate that much and will frequently say it doesn't know things.

One of the few cases where I've noticed it inventing things are that it often makes up apis for programming libraries and CLI tools that don't exist, and that's trivially fixable by referring it to documentation.



I have to use LLMs for work projects - which are not PoCs.

I can’t have a tool that makes up stuff an unknown amount of time.

There is a world of research examining hallucination Rates, indicating hallucination rates of 30%+.

With steps to reduce it using RAGs, you could potentially improve the results significantly - last I checked it was 80-90%.

And the failure types aren’t just accuracy, it’s precision, recall, relevance and more.



> There is a world of research examining hallucination Rates, indicating hallucination rates of 30%+.

I want to see a citation for this. And a clear definition for what is a hallucination and what isn't.



https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=llm+...

https://www.medpagetoday.com/ophthalmology/generalophthalmol....

- Survey of Hallucination in Natural Language Generation](https://arxiv.org/abs/2202.03629) (Ji et al., 2022)

- [How Language Model Hallucinations Can Snowball](https://arxiv.org/abs/2305.13534) (Zhang et al., 2023)

- [A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity](https://arxiv.org/abs/2302.04023) (Bang et al., 2023)

- [Contrastive Learning Reduces Hallucination in Conversations](https://arxiv.org/abs/2212.10400) (Sun et al., 2022)

- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) (Wang et al., 2022)

- [SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models](https://arxiv.org/abs/2303.08896) ( Manakul et al., 2023)



I also wonder if having a hallucination-free LLM is even required for it to be useful. Humans can and will hallucinate (by this I mean make false statements in full confidence, not drugs or mental states) and they’re entrusted with all sorts of responsibilities. Humans are also susceptible to illusions and misdirection just like LLMs. So in all likelihood there is simply some state of ‘good enough’ that is satisfactory for most tasks. Perusing the elimination of hallucinations to the nth degree may be a fools errand.


Tools are not people and people should not be considered as tools. Imagine your hammer only hitting the nail 60% of the time! But workers should be allowed to stop working to negotiate work conditions.


> I am generally skeptical whether such a highly formal approach can help us answer the practical question of whether we can get LLMs to answer 'I don't know' more often

I feel like writing an entire paper about the practical approach to the problems posed in this paper, but you'll probably have to first formally define the language used in the training data before you can try to map it (through training and sampling algos, which this paper conveniently skipped) to the target form. This sounds really fun at first, but then we're once again talking about the strict formalisation of natural language (which you could still do - the training data is limited and fixed!)



I can assure you it has no relevance for people working with LLMs, as the result includes your brain, for example.


> we can get LLMs to answer 'I don't know' more often

Have any nets been trained specifically to be able to go to an 'i don't know' state, I wonder?

It may be the humans' fault.



Yes, you can find some of the work on this topic under the terms open world recognition or open world X where X is a topic in computer vision or NLP.

https://arxiv.org/abs/2011.12906



Maybe, but are the LLM churches doing it?


Saying “I don’t know” implies you understand what “I” means.


According to their definition, answering "I don't know" is also a hallucination. Even worse, the truth function is deliberately designed to trip up the models, it has no connection to any real-world truth. So for example, if the input is "what is 2 + 5?" and the LLM answers "7", - their truth function will say that's a hallucination, the correct answer is "banana".


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com