![]() |
|
![]() |
| > At the end of the day we’re just a weighted neural net making seat of the pants confidence predictions too.
We might be. Or we might be something else entirely. Who knows? |
![]() |
| We can put electrodes in the brain and in neurons in Petri dishes, and see with our own eyes their activating behaviors. Our brains are weighted neural networks. This is certain. |
![]() |
| I wonder if that's the case - the prompt text (like all text interaction with LLMs) is seen from "within" the vector space, while sparcity is only observable from the "outside" |
![]() |
| I think good, true, but rare information would also fit that definition so it'd be a shame if it discovered something that could save humanity but then discounted it as probably not accurate. |
![]() |
| > The easiest way to control this phenomenon is using the “hallucination” tokens, hence the construction of this prompt.
That's what I'm getting at. Hallucinations are well known about, but admitting that you "hallucinated" in a mundane conversation is a rare thing to happen in the training data, so a minimally prompted/pretrained LLM would be more likely to say "Sorry, I misinterpreted" and then not realize just how grave the original mistake was, leading to further errors. Add the word hallucinate and the chatbot is only going to humanize the mistake by saying "I hallucinated", which lets it recover from extreme errors gracefully. Other words, like "confabulation" or "lie", are likely more prone to causing it to have an existential crisis. It's mildly interesting that the same words everyone started using to describe strange LLM glitches also ended up being the best token to feed to make it characterize its own LLM glitches. This newer definition of the word is, of course, now being added to various human dictionaries (such as https://en.wiktionary.org/wiki/hallucinate#Verb) which will probably strengthen the connection when the base model is trained on newer data. |
![]() |
| > Probably for the best that users see the words "Sorry, I hallucinated" every now and then.
Wouldn’t “sorry, I don’t know how to answer the question” be better? |
![]() |
| >...mentions or cites particular articles, papers, or books, it always lets the human know that it doesn’t have access to search or a database...
I wonder if we can create a "reverse Google" -- which is a RAG/Human Reinforcement GPT-pedia == Where we dump "confirmed real" information into it that is always current - and all LLMs are free to harvest directly from it in discernment of crafting responses. For example - it could accept FireHose all current/active streams/podcasts of anything "live" and be like an AI-Tivo for any live streams and it can havea temporal windows that you cans search through "Show me every instance of [THING FROM ALL LIVE STREAMS WATCHED IN THE LAST 24 HOURS] - give me a markdown of the top channels, views, streams, comments - controversy, retweets regarding that topic. sort by time posted. (Recall that HNer posting the "if youtube had channels:") https://news.ycombinator.com/item?id=41247023 -- Remember when "Twitter give 'FireHose' directly to the Library of Congress! Why not firehose GPT-to tha Tap Data'sset https://www.forbes.com/sites/kalevleetaru/2017/12/28/the-lib... |
![]() |
| Personally still amazed that we live in a time where we can tell a computer system in pure text how it should behave and it _kinda_ works |
![]() |
| It actually still scares the hell out of me that this is the way even the experts 'program' this technology, with all the ambiguities rising from the use of natural language. |
![]() |
| Keep in mind that this is not the only way the experts program this technology.
There's plenty of fine-tuning and RLHF involved too, that's mostly how "model alignment" works for example. The system prompt exists merely as an extra precaution to reinforce the behaviors learned in RLHF, to explain some subtleties that would be otherwise hard to learn, and to fix little mistakes that remain after fine-tuning. You can verify that this is true by using the model through the API, where you can set a custom system prompt. Even if your prompt is very short, most behaviors still remain pretty similar. There's an interesting X thread from the researchers at Anthropic on why their prompt is the way it is at [1][2]. [1] https://twitter.com/AmandaAskell/status/1765207842993434880?... [2] and for those without an X account, https://nitter.poast.org/AmandaAskell/status/176520784299343... |
![]() |
| I think OP's point is that "hope" is never a substitute for "a battery of experiments on dependably constant phenomena and supported by strong statistical analysis." |
![]() |
| I can have a conversation with LLM's. they can walk me through the troubleshooting without prior knowledge of programming languages.
That seems like a massive advantage. |
![]() |
| ... or - worse even - something you think is what you want, because you know not better, but happens to be a wholy (or - worse - even just subtly partially incorrect) confabulated answer.- |
![]() |
| The original founders realised the weakness of Siri and started a machine learning based assistent which they sold to Samsung. Apple could have taken the same route but didn't. |
![]() |
| And "kinda" is an understatement. It understands you very well, perhaps even better than the average human would. (Average humans often don't understand jargon.) |
![]() |
| Maybe in some cases. But generally speaking the consensus in the language translation industry is that NMT (e.g. Google Translate) still provides higher quality than current gen LLMs. |
![]() |
| I'm not clear why you think that's closer?
The very first thing that happens in most LLMs is that information getting deleted by the letters getting converted into a token stream. |
![]() |
| Nobody’s* claiming that. People are being imprecise with language and others are imagining the claim and reacting.
* ok someone somewhere is but nobody in this conversation |
![]() |
| Odd how many of those instructions are almost always ignored (eg. "don't apologize," "don't explain code without being asked"). What is even the point of these system prompts if they're so weak? |
![]() |
| It's common for neural networks to struggle with negative prompting. Typically it works better to phrase expectations positively, e.g. “be brief” might work better than ”do not write long replies”. |
![]() |
| But surely Anthropic knows better than almost anyone on the planet what does and doesn't work well to shape Claude's responses. I'm curious why they're choosing to write these prompts at all. |
![]() |
| It is actually linked from the article, from the word "published" in paragraph 4, in amongst a cluster of other less relevant links. Definitely not the most obvious. |
![]() |
| After reading the first 2-3 paragraphs I went straight to this discussion thread, knowing it would be more informative than whatever confusing and useless crap is said in the article. |
![]() |
| I was also pretty shocked to read this extremely specific direction, given my (many) interactions with Claude.
Really drives home how fuzzily these instructions are interpreted. |
![]() |
| It's possible it reduces the rate but doesn't fix it.
This did make me wonder how much of their training data is support emails and chat, where they have those phrases as part of standard responses. |
![]() |
| I believe that the system prompt offers a way to fix up alignment issues that could not be resolved during training. The model could train forever, but at some point, they have to release it. |
![]() |
| We know that LLMs hallucinate, but we can also remove them.
I’d love to see a future generation of a model that doesn’t hallucinate on key facts that are peer and expert reviewed. Like the Wikipedia of LLMs https://arxiv.org/pdf/2406.17642 That’s a paper we wrote digging into why LLMs hallucinate and how to fix it. It turns out to be a technical problem with how the LLM is trained. |
![]() |
| yep theres a lot more to the prompt that they haven't shared here. artifacts is a big one, and they also inject prompts at the end of your queries that further drive response. |
![]() |
| Publishing the system prompts and its changelog is great. Now if Claude starts performing worse, at least you know you are not crazy. This kind of openness creates trust. |
![]() |
| Fine-tuning is expensive and slow compared to prompt engineering, for making changes to a production system.
You can develop validate and push a new prompt in hours. |
> If Claude is asked about a very obscure person, object, or topic, i.e. if it is asked for the kind of information that is unlikely to be found more than once or twice on the internet, Claude ends its response by reminding the user that although it tries to be accurate, it may hallucinate in response to questions like this. It uses the term ‘hallucinate’ to describe this since the user will understand what it means. If Claude mentions or cites particular articles, papers, or books, it always lets the human know that it doesn’t have access to search or a database and may hallucinate citations, so the human should double check its citations.
Probably for the best that users see the words "Sorry, I hallucinated" every now and then.