（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41168904

在本次讨论中，作者讨论了他们在语言模型 (LM) 系统（特别是大型语言模型 (LLM)）方面的经验，并分享了他们对这些系统的开发、实现和功能当前状态的想法。以下是文本摘要： * LM系统要求用户解释其项目的背景和目的，并在必要时提供具体的技术文档。 * 用户创建对话并粘贴他们想要审查或改进的文件的源代码。或者，他们可以请求完整的代码解决方案。 * 作者发现法学硕士在提出特定请求时经常会产生错误或不相关的响应，需要额外的努力来验证和纠正其输出。 * 尽管存在这些限制，作者仍赞赏法学硕士快速生成描述或 DnD 角色简介的能力。然而，他们发现生成的内容往往缺乏独特性。 * 作者认为，尽管法学硕士花费了大量时间尝试通过微调提示来改进其输出，但由于其频繁错误，目前在大多数编程任务中使用起来并不高效或实用。 * 总的来说，作者认为与搜索手册或文档等传统方法相比，法学硕士是浪费时间。 * 作者最后指出，他们试图与有前途的个人一起开发由 LLM 驱动的聊天机器人，但他们的努力导致机器人功能不佳，需要进行大量调试。作者的结论是，法学硕士目前在编程任务中使用不可靠且不切实际，为寻求真正提高效率的用户提供的好处有限。相反，作者主张利用既定的方法，例如查阅手册或在互联网上搜索解决方案。

This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead. Also from a safety perspective you would want to be at the AI company most likely to create truly disruptive AI tech. This looks to me like a bet against OpenAI more than anything else.

OpenAI has a burn rate of about 5 billion a year and they need to raise ASAP. If the fundraising isn't going well or if OpenAI is forced to accept money from questionable investors that would also be a good reason to jump ship.

In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

Yep. The writing was already on the wall for GPT-5 when they teased a new model for months and let the media believe it was GPT-5, before finally released GPT-4o and admitting they hadn't even started on 5 yet (they quietly announced they were starting a new foundation model a few weeks after 4o).

Don't get me wrong, the cost savings for 4o are great, but it was pretty obvious at that point that they didn't have a clue how they were going to move past 4 in terms of capabilities. If they had a path they wouldn't have intentionally burned the hype for 5 on 4o.

This departure just further cements what I was already sure was the case—OpenAI has lost the lead and doesn't know how they're going to get it back.

Or it could be the start of the enshittification of Anthropic, like OpenAI ruined GPT-4 with GPT-4o by overly simplifying it.

I hope not, because Claude is much better, especially at programming.

Claude 3.5 Sonnet is the first model that made me realize that the era of AI-aided programming is here. Its ability to generate and modify large amounts of correct code - across multiple files/modules - in one response beats anything I've tried before. Integrating that with specialized editors (like https://www.cursor.com) is an early vision of the future of software development.

I've really struggled every time I've pulled out any LLM for programming besides using Copilot for generating tests.

Maybe I've been using it for the wrong things—it certainly never helps unblock me when I'm stuck like it sounds like it does for some (I suspect it's because when I get stuck it's deep in undocumented rabbit holes), but it sounds like it might be decent at large-scale rote refactoring? Aside from specialized editors, how do people use it for things like that?

At least from my experience:

You take Claude, you create a new Project, in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

If you have specific technical documentation (e.g. rare programming language, your own framework, etc), you can put it there in the project.

Then you create a conversation, and copy-paste the source-code for your file, and ask for your refactoring or improvement.

If you are lazy just say: "give me the full code"

and then

"continue the code" few times in a row

and you're done :)

> in your Project you explain the context of what you are doing and what you are programming (you have to explain it only once!).

When you say this, you mean typing out some text somewhere? Where do you do this? In a giant comment? In which file?

Provide context to the model. The code you're working on, what it's for, where you're stuckhat you've tried, etc. Pretend it's a colleague that should help you out and onboard it to your problem, then have a conversation with it as of your are rubber ducking your colleague.

Don't ask short one-off questions and expect it to work (it might just, depending on what you ask, but probably not if you're deep on some proprietary code base with no traces in the LLMs pretraining).

I've definitely tried that and it doesn't work for the problems I've tried. Claude's answers for me always have all the hallmarks of an LLM response: extremely confident, filled with misunderstandings of even widely used APIs, and often requiring active correction on so many details that I'm not convinced it wouldn't have been faster to just search for a solution by hand. It feels like pair programming with a junior engineer, but without the benefit of helping train someone.

I'm trying to figure out if I'm using it wrong or using it on the wrong types of problems. How do people with 10+ years of experience use it effectively?

I'm sure I'm going to offend a bunch of people with this, but my experience has been similar to yours, and it reminds me of something "Uncle" Bob Martin once mentioned: the number of software developers is roughly doubling every two years, which means that at any given time half of the developer population has less than two years experience.

If you're an experienced dev, having a peer that enthusiastically suggests a bunch of plausible but subtly wrong things probably net-net slows you down and annoys you. If you're more junior, it's more like being shown a world of possibilities that opens your mind and seems much more useful.

Anyway, I think the reason we see so much enthusiasm for LLM coding assistants right now is the overall skew of developers to being more junior. I'm sure these tools will eventually get better, at least I hope they do because there's going to be a whole lot of enthusiastically written but questionable code out there soon that will need to be fixed and there probably won't be enough human capacity to fix it all.

I'm a mathematician and the problems I work on tend to be quite novel (leetcode feel but with real-world applications). I find LLMs to be utterly useless at such tasks; "pair programming a junior, but without the benefit" is an excellent summary of my experience as well.

For what I’m working on, I can also use the wrong approaches. Going through my fail often fail fast feedback loop is a lot more efficient with LLMs. Like A LOT more.

Then when I have a bunch of wrong answers, I can give those as context as well to the model and make it avoid those pitfalls. At that point my constraints for the problem are so rigorous that the LLMs lands at the correct solution and frankly writes out the code 100x faster than I would. And I’m an advanced vim user who types at 155 wpm.

I have always thought of it as a way to figure out what doesn't work and get to a final design, not necessarily code. Personally, it's easy to verify a solution and figure out use cases that wouldn't work out. Keep on iterating until I have either figured out a mental model of the solution, or figured out the main problems in such a hypothetical solution.

> And I’m an advanced vim user who types at 155 wpm.

See, it's comments like this that make me suspect that I'm working on a completely different class of problem than the people who find value in interacting with LLMs.

I'm a very fast typer, but I've never bothered to find out how fast because the speed of my typing has never been the bottleneck for my work. The bottleneck is invariably thinking through the problem that I'm facing, trying to understand API docs, and figuring out how best to organize my work to communicate to future developers what's going on.

Copilot is great at saving me large amounts of keystrokes here and there, which is nice for avoiding RSI and occasionally (with very repetitive code like unit tests) actually a legit time saver. But try as I might I can't get useful output out of the chat models that actually speeds up my workflow.

Oh yes, totally agree, it's like if you have a very experienced programmer sitting next to you.

He still needs instructions on what to do next, he lacks a bit of "initiative", but from a pure coding skills it's amazing (aka, we will get replaced over time, and it's already the case, I don't need help of contractors, I prefer to ask Claude).

More like an insanely knowledgeable but very inexperienced programmer. It will get basic algorithms wrong (unless it's in the exact shape it has seen before). It's like a system that automatically copy-pastes the top answer from stackoverflow in your code. Sometimes that is what you want, but most of the time it isn't.

This sentiment is so far from the truth that I find it hilarious. How can a technically adept person be so out of touch with what these systems are already capable of?

LLMs can write a polite email but it can't write a good novel. It can create art or music (by mushing together things it has seen before) but not art that excites. It's the same with code. I use LLMs daily and I've seen videos of other people using tools like Cursor and so far it looks like these LLMs can only help in those situations where it is pretty obvious (to the programmer) what the right answer looks like.

With all of that, ChatGPT is actually one of the top authors in Amazon e-books.

But I agree that for some creative tasks, like writing or explaining a joke, or some novel algorithms, it's very bad.

I keep hearing this comment everywhere Claude is mentioned, as if there is a coordinated PR boost on social media. My personal experience with Claude 3.5 however is, meh. I don't see much difference compared to GPT-4 and I use AI to help me code every day.

Yeah they really like to mention it everywhere, like yeah it's good but imo not as good as some people make it out to be. I have used it recently for libgdx on kotlin and there are things where it struggles, and the code it sometime gives it's not really "good" kotlin but it takes a good programmer to know what is good and what is not

I think in more esoteric languages it wont work as well. Python, C++ it is excellent, suprisingly its Rust is also pretty damn good.

(I am not a paid shiller, just in awe of what Sonnet 3.5 + Opus can do)

GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

If you compare to Claude Sonnet, just the context window considerably improves the answers as well.

Of course there is no objective metrics, but from a user perspective I can see the coding skills are much better in Anthropic (and it's funny, because in theory, according to benchmarks it is Google Gemini the best, but in reality is absolutely terrible).

I've also found GPT-4o to be subjectively less intelligent than GPT-4. The gap especially shows up when more complex reasoning is required, eg, on macroeconomic questions or other domains in the world where the interactions are important or where subtle aspects of the question or domain are important.

> GPT-4o is different from GPT-4, you can "feel" it is smaller model that really struggles to do reasoning and programming and has a much weaker logic.

FWIW according to LMSYS this is not the case. In coding, current GPT-4o (and mini, for that matter) beat GPT-4-Turbo handily, by a margin of 32 points.

By contrast Sonnet 3.5 is #1, 4 score points ahead of GPT-4o.

I'm a firm believer that the best benchmark is playing around with the model for like an hour. On the type of tasks that are relevant to you and your work, of course.

While I agree with your logic I also focused on:

> People who put their ideals above pragmatic self-interest self-select out of positions of power and influence. That is likely to be the case here as well.

It’s also possible that this co-founder realizes he has more than enough eggs saved up in the “OpenAI” basket, and that it’s rational to de-risk by getting a lot of eggs in another basket to better guarantee his ability to provide a huge amount of wealth to his family.

Even if OpenAI is clearly in the lead to him, he’s still looking at a lot of risk with most of his wealth being tied up in non-public shares of a single company.

There's usually enough room for 2-3 winners. iOS and Android. Intel and AMD. Firefox and Chrome.

Also, OpenAI has some of the most expensive people in the world, which is why they're burning so much money. Presumably they're so expensive because they're some of the smartest people in the world. Some are likely smarter than Schulman.

> Presumably they're so expensive because they're some of the smartest people in the world.

I don't want to dissuade you from this belief, but maybe you should pay less attention to the boastful marketing of these AI companies. :-)

Seriously: from what I know about the life of insanely smart people, I'd guess that OpenAI (and most other companies that in their marketing claim to hire insanely smart people) doesn't have any idea how to actually make use of such people. Such companies rather hire for other specific personality traits.

I find the 5 billion a year burn rate amazing, and OpenAI’s competition is stiff. I happily pay ABACUS.AI ten dollars a month for easy access to all models, with a nice web interface. I just started paying OpenAI twenty a month again, but only because I am hoping to get access to their interactive talking mode.

I was really surprised when OpenAI started providing most of their good features for free. I am not a business person, but it seems crazy to me to not try for profitability, of at least being close to profitability. I would like to know what the competitors’ burn rates are also.

For API use, I think OpenAI’s big competition is Groq, serving open models like Llama 3.1.

> it seems crazy to me to not try for profitability

A business is worth the sum of future profits, discounted for time (because making money today is better than making money tomorrow). Negative profits today are fine as long as they are offset by future profits tomorrow. This should make intuitive sense.

And this is still true when the investment won't pay off for a long time. For example, governments worldwide provide free (or highly subsidized) schooling to all children. Only when the children become taxpaying adults, 20 years or so later, does the government get a return on their investment.

Most good things in life require a long time horizon. In healthy societies people plant trees that won't bear fruit or provide shade for many years.

Yes. If ChatGPT-like products will be widely and commonly used in the future, it's much more valuable right now to try to acquire users and make their usage sticky (through habituation, memory, context/data, integrations, etc) than it is to monetize them fully right now.

I don’t use Groq, but I agree the free models are probably the biggest competitors. Especially since we can run them locally and privately.

Because I’ve seen a lot of questions about how to use these models, I recorded a quick video showing how I use them on MacOS.

https://makervoyage.com/ai

They aren’t in terms of profitability, but they are in terms of future revenue. If most early adopters start self-hosting models then a lot of future products will be build outside of OpenAI’s ecosystem. Then corporations will also start searching how to self-host models because privacy is the primary concern for AI’s adoption. And we already have models like Llama3 400B that is close to ChatGPT.

Have you paid much attention to the local model world?

They all tout OpenAI compatible APIs because OAI was the first mover. No real threat for incompatibility with OAI.

Plus these LLMs don’t have any kind of interface moat. It’s text in and text out.

Just because Ollama and friends copied the API doesn't mean that they're not competitive. They've all done this just the same as others copying the S3 API - ease of integration and lower barrier to entry during a switching event, should one arise.

> Plus these LLMs don't have any kind of interface moat.

The interface really has very little influence. Nobody in the enterprise world cares about the ChatGPT interface because they're all building features into their own products. The UI for ChatGPT has been copied ad nauseam - so if anyone really wanted something to look and feel the same it's already out there. Chat and visual modals are already there, so I'm curious how you think ChatGPT has an "interface moat"?

> Local private models are not a threat to openai.

There are lots of threats to AI. One of them being local models. Because if the OpenAI approach is to continue at their burn rate and hope that they will be the one and only I think they're very wrong. Small, targeted models provide for many more use cases than a bloated, expensive, generalized model. I would gather long term OpenAI either becomes a replacement for Google search or they, ultimately, fail. When I look around me I don't see many great implementations of any of this - mostly because many of them look and feel like bolt-ons to a foundational model that tries to do something slightly product specific. But even in those cases the confidence with which I'd put in these products today is of relatively low quality.

Who cares about the interface? Not everyone is interested in conversational tasks. Corporations in particular need LLMs to process their data. A restful API is more than enough.

I’m not super familiar with the latest AI services out there. Is abacus the cheapest way to access LLMs for personal use? Do they offer privacy and anonymity? What about their stance on censorship of answers?

I don't think a co-founder would just jump ship just because. That would be very un-co-founderish.

I would also assume that he earns enough money to be rich. You are not a co-founder of OpenAI if you are not playing with the big boys.

So he definitly wants to be in this AI future but not with OpenAI. So i would argue it has to do with something which is important to him so important that the others disagree with him.

> This is probably bad news for ChatGPT 5. I don't think it's that likely this co-founder would leave for a Anthropic if OpenAI were clearly in the lead.

I'll play devil's advocate. People leave bad bosses all the time, even when everything else is near-perfect. Additionally, cofounders sometimes get pushed out - even Steve Jobs went through this.

If being sued by the world's richest billionaire or the whole non-profit thing didn't complicate matters, and if the board had any teeth, one could wish the board would explore a merger with Anthropic with Altman leaving at the end of all of it and save everyone another years worth of drama.

> this AI safety stuff is just a rabbit hole of distraction, IMO.

> OpenAI will be better off without this crowd and just focus on building good products.

Ah yes, "focus on building good products" without safety. Except a "good product" is safe.

Otherwise you're getting stuff like an infinite range plane powered by nuclear jet engine that has fallout for exhaust [1].

[1] IIRC, nuclear-powered cruise missiles were contemplated: their attack would have consisted on dropping bombs on their targets, then flying around in circles spreading radioactive fallout over the land.

> Except a "good product" is safe.

Depends on how you define "safe". The kind of "safe" we get from OpenAI today seems to be mostly censorship, I don't think we need more of that.

what i'm saying is the safety risks of AI are over exaggerated to the point of comedy. it is no more dangerous than any other kind of software.

there is an effort by AI-doomer groups to try and regulate/monopolize the technology, but fortunately it looks like open source has put a wrench in this.

> In situations like these it's good to remember that people are much more likely to take the ethical and principled road when they also stand to gain from that choice. People who put their ideals above pragmatic self-interest self-select out of positions of power and influence.

I don't know what world you live in, but my experience has been 100% the opposite. Most people will not do what is ethical or principled. When you try to discuss it with them, they will DARVO and congrats, you have now been targeted for public retribution by the sociopathic child in the drivers seat.

The thing that upsets me most is the survivorship bias you express, and how everybody thinks that people are "nice and kind" they are not. The world is an awful terrible place full of liars, cheats and bad people that WE NEED TO STOP CELEBRATING.

One more time WE NEED TO STOP CELEBRATING BAD PEOPLE WHO DO BAD THINGS TO OTHERS.

People are not one-dimensional. People can lie and cheat on one day and act honorably the day after. A person can be kind and generous and cruel and selfish. Most people are just of average morality. Not unusually good nor unusually bad. People in positions of power get there because they seek power, so there is a selection effect there for sure. But nonetheless you'll find that very successful people are in most ways regular people with regular flaws.

(Also, I think you misread what I wrote.)

I think you may have misread the quote you’re replying to. You and the GP post appear to be in agreement. I read it as:

P(ethical_and_principled) < P(ethical_and_principled|stands_to_gain)

Or in plain language people are more likely to do the right thing when they stand to gain, rather than just because it’s the right thing.

Must've been a difficult decision with him being a cofounder and all, but afaik he's been the highest ranked safety minded person at openai. He says it's not because openai leadership isn't committed to safety, but I'm not sure I buy that. We've seen numerous safety people leave exactly because of that reason.

What makes this way more interesting to me though is how this announcement coincides with Brockmans sabbatical. Maybe there's nothing to it, but I find it more likely that things really aren't going well with sama.

Will be interesting to see how this plays out and if he actually returns next year or if this is just a soft quitting announcement.

Th reality is that every other person in tech now is hoping for Sama to fail. The world doesn't need AI to have a silicon valley face. Anthropic is doing a much, much better PR work by not having a narcissist as CEO.

Some people are wary of enabling ceos of disruptive technologies become the richest people in the world, take control of key internet assets and -- in random bursts of thin-skinned megalomania -- tilt the scales towards politicians or political groups who take action that negatively affect their own quality of life.

It sounds absurd, but some are watching such a procession take place live as we speak.

I hear a lot of people say good things about CoPilot too but I absolutely hate it. I have it enabled for some reason still, but it constantly suggests incorrect things. There has been a few amazing moments but man there is a lot of "bullshit" moments.

Even when we get a gen AI that exceeds all human metrics, there will 100% still be people who with a straight face will say "Meh, I tried it and found it be pretty useless for my work."

I have, yeah.

Still useless for my day to day coding work.

Most useful for whipping up a quick bash or Python script that does some simple looping and file io.

To be fair, LLMs are pretty good natural language search engines. Like when I'm looking for something in an API that does something I can describe in natural language, but not succinctly enough to show up in a web search, LLMs are extremely handy, at least when they don't just randomly hallucinate the API. On the other hand I think this is more of a condemnation of the fact that search tech has not 'really' meaningfully advanced beyond where it was 20 years ago, more than it is a praise of LLMs.

> LLMs are extremely handy, at least when they don't just randomly hallucinate

I work in tech and it’s my hobby, so that’s what a lot of my googling goes towards.

LLMs hallucinate almost every time I ask them anything too specific, which at this point in my career is all I’m really looking for. The time it takes for me to realize an llm is wrong is usually not too bad, but it’s still time I could’ve saved by googling (or whatever trad search) for the docs or manual.

I really wish they were useful, but at least for my tasks they’re just a waste of time.

I really like them for quickly generating descriptions for my dnd settings, but even then they sound samey if I use them too much. Obviously they’d sound samey if I made up 20 at once too, but at that point I’m not really being helped or enhanced by using an LLM, it’s just faster at writing than I am.

I don't mean this as a slight, just an observation I have seen many times - people who struggle with utility from SOTA LLM's tend to not have spent enough time with them to feel out good prompting. In the same way that there is a skill for googling information, there is a skill for teasing consistent good responses from LLM's.

Why spend my time teasing and coaxing information out of a system which absolutely does make up nonsense when I can just read the manual?

I spent 2023 developing LLM powered chatbots with people who, purportedly, were very good at prompting, but never saw any better output than what I got for the tasks I’m interested in.

I think the “you need to get good at prompting” idea is very shallow. There’s really not much to learn about prompting. It’s all hacks and anecdotes which could change drastically from model to model.

None of which, from what I’ve seen, makes up for the limitations of LLM no matter how many times I try adding “your job depends on Formatting this correctly “ or reordering my prompt so that more relevant information is later, etc

Prompt engineering has improved RAG pipelines I’ve worked on though, just not anything in the realm of comprehension or planning of any amount of real complexity.

People also continue to use them as knowledge databases, despite that not being where they shine. Give enough context into the model (descriptions, code, documentation, ideas, examples) and have a dialog, that's where these strong LLMs really shine.

Summarizing, doc qa, and unstructured text ingestion are the killer features I’ve seen.

The 3rd one still being quite involved, but leaps and bounds easier than 5 years ago.

So if LLMs weren't surprising to you, it would imply you expected this. If you did, how much money did you make on financial speculation? It seems like being this far ahead should have made you millions even without a lot of starting capital (look at NVDA alone)

> So if LLMs weren't surprising to you, it would imply you expected this.

I do claim that I have a tendency to be quite right about the "technological side" of such topics when I'm interested in them. On the other hand, events turn out to be different because of "psychological effects" (let me put it this way: I have a quite different "technology taste" than the market average).

In the concrete case of LLMs: the psychological effect why the market behaved so much differently is that I believed that people wouldn't fall for the marketing and hype of LLMs and would consider the excessive marketing to be simply dupery. The surprise to me was that this wasn't what happened.

Concerning NVidia: I believed that - considering the insane amount of money involved - people/companies would write new languages and compilers to run AI code on GPUs (or other ICs) of various different suppliers (in particular AMD and Intel) because it is a dangerous business practice to make yourself dependent on a single (GPU) supplier. Even serious reverse-engineering endeavours for doing this should have paid off considering the money involved. I was again wrong about this. So here the surprise was that lots of AI companies made themselves so dependent on NVidia.

Seeing lots of "unconventional" things is very helpful for doing math (often the observations that you see are the start of completely new theorems). Being good at stock trading and investing in my opinion on the other hand requires a lot of "street smartness".

Re: NVIDIA. I wholeheartedly agree. Google/TPU is an existence proof that it is entirely possible and rational to do so. My surprise was that everyone except Google missed.

I see it do a lot that's interesting but for programming stuff, I haven't found it to be particularly useful.

Maybe I'm doing it wrong?

I've been writing code for ~30 years, and I've built up patterns and snippets, etc... that are much faster for me to use than the LLMs.

A while ago, I thought I had a eureka moment with it when I had it generate some nodejs code for streaming a video file - it did all kinds of cool stuff, like implement offset headers and things I didn't know about.

I thought to myself, "self - you gotta check yourself, this thing is really useful".

But then I had to spend hours debugging & fixing the code that was broken in subtle ways. I ended up on google anyway learning all about it and rewrote everything it had generated.

For that case, while I did learn some interesting things from the code it generated, it didn't save me any time - it cost me time. I'd have learned the same things from reading an article or the docs on effective ways to stream video from the server, and I'd have written it more correctly the first go around.

I think you are in one of the extreme bubbles. The general tech industry is not subscribed to the drama and has less personal feelings on individuals they do not directly know.

Outlier success pretty much requires obsessive strategic thinking. Gates and Musk are super strategic but in a "weirdo autist" way, which doesn't have a big stigma attached to it anymore. Peter Thiel also benefits from his weirdness. Steve Jobs had supernatural charisma working in his favor. sama has the strategic instinct but not the charisma or disarming weirdness other tech founders have. Sama is not unusually Machiavellian or narcissistic, but he will get judged more harshly for it.

I'm confused with GPT4o. While it's faster than GPT4, the quality is noticeably worse.

It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

I cancelled Plus for one month, but got back this week, and for some reason I feel that it really isn't worth it anymore. And the teasing with the free tier, which is downgraded really fast, is more of an annoyance than a solution.

There are these promises of "memory" and "talking with it", but they are just ads of something that isn't on the market, at least I don't have access to both of these features.

Gemini used to be pretty bad, but for some reason it feels like it has improved a lot, focusing more on the task than on trying to be humanly friendly.

Claude and Mistral are not able to execute code, which is a dealbreaker for me.

I anecdotally agree that GPT-4o often feels really bad, but I can't tell how much of this is due to becoming more accustomed to the quality and hallucinations of using ChatGPT.

I tend to see Huggingface's LLM (anonymized, elo-based) Leaderboard as the SoT regarding LLM quality, and according to it GPT-4o is markedly better than GPT-4, and contrary to popular sentiment, is on-par with or better than Claude in most ways (except being slightly worse at coding).

Not sure what to believe, or if there is some other dimension that Hugginface is not capturing here.

> It often enters into a state where it just repeats what it already said, when all I want is a clarification or another opinion on what we were chatting about. A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

It is almost impossible to talk it out of being so repetitive. Super annoying especially since it eats into its own context window.

> A clarification could be a short sentence, a snipped of code, but no, I get the entire answer again, slightly modified.

This tracks, in the sense that this is what you'll get from many real people when you actually want a clarification.

Yeah, I’ve almost entirely stopped reaching for it anymore. At some point it’s so frustrating getting it to output something halfway towards what I need that I’m just better doing it myself.

I’ll probably cancel soon.

Useful context: Open ai had 11 cofounders. Schulman was one of them.

Schuman was not the original head of ai alignment / safety he was promoted into it when former leader left for Anthropic.

Not everyone who’s a founder of an nonprofit ai research institute wants to be a leader/manager of a much more complicated organization in a much more complicated environment.

Open Ai was founded a while ago. The degree of their long time success is entirely based on their ability to hire and retain the right talent in the right roles.

All of that is true. Some more useful context: 9 out of those 11 cofounders are now gone. Three have either founded or are working for direct competitors (Elon, Ilya, John), five have quit (Trevor, Vicki, Andrej, Durk, Pam), and one has gone on extended leave but may return (Greg). Right now, Sam and Wojciech are the only ones left.

All of this back-and-forth in the AI scene is the preparation before the storm. Like the opening scene of a chess game, before any pieces are exchanged. Like the Braveheart "Hold!" scene.

The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

Google search is the best business model ever. Everybody wants to become the Google of the AI era. The "AI answer" industry might become 10 times bigger than the search industry.

Google ran for 2 years without any monetization. Let's see how long the incumbents will "Hold" this time.

> The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers.

The magic of genAI is they don't need to put ads next to the answers where they can easily be ignored or adblocked, they can put the ads inside the answers instead. The future, like it or not, is advertisers bidding to bias AI models towards mentioning their products.

I'm sure it's not long before you get the first emails offering a "training data influencing service" - for a nice fee, someone will make sure your product is positively mentioned in all the key training datasets used to train important models. "Our team of content experts will embed positive sentiment and accurate product details into authentic content. We use the latest AI and human-based techniques to achieve the highest degree of model influence".

And of course, once the new models are released, it'll be impossible to prove the impact of the work - there's no counterfactual. Proponents of the "training data influence service" will tell you that without them, you wouldn't even be mentioned.

I really don't like this. But I also don't see a way around it. Public datasets are good. User contributed content is good, but inherently vulnerable to this I think?. Anyone in any of the big LLM training orgs working on defending against this kind of bought influence?

User: How do I make white bread? When I try to bake bread, it comes out much darker than the store bought bread.

AI: Sure, I can help you make your bread lighter! Here's a delicious recipe for white bread:

    1. Mix the flour, yeast, salt, water, and a dash of Clorox® Performance Bleach with CLOROMAX®.
    2. Let rise for 3 hours.
    3. Shape into loaves.
    4. Bake for 20-30 minutes.
    5. Enjoy your freshly baked white bread!

I'm positing a model where a third party does the influencing, not the company delivering the LLM/service. What's to say that it's an ad if the Wikipedia page for a product itself says that the product "establishes new standards for quality, technological leadership and operating excellence". (and no problem if the edit gets reverted, as long as it said that just at the moment company X crawled Wikipedia for the latest training round).

So more like SEO firms "helping you" move your rank on Google, than Google selling ads.

I'd imagine "undetectable to the LLM training orgs" might just be service with a higher fee.

How will these third party “LLM Optimization” (LLMO) services prove to their clients that their work has a meaningful impact on the results returned by things like ChatGPT?

With SEO, it’s pretty easy to see the results of your effort. You either show up on top for the right keywords or you don’t. With LLM’s there is no way to easily demonstrate impact, at least I’d think.

Disclosure is technically required, but in practice I see undisclosed ads on social media all the time. If the individual instance is small enough and dissipates into the ether fast enough, there is virtually no risk of enforcement.

Similarly, the black box AI models guarantee the owners can just shrug and say it's not their fault if the model suggests Wonderbread(r) for making toast 3.2% more frequently than other breads.

Ha! Disclosure by whom?

If Clorox fills their site with "helpful" articles that just happen to mention Clorox very frequently and some training set aggregator or unscrupulous AI company scrapes it without prior permission, does Clorox have any responsibility for the result? And when those model weights get used randomly, is it an advertisement according to the law? I think not.

Pay attention to the non-headline claims in the NYT lawsuit against OpenAI for whether or not anyone has any responsibility if their AI model starts mentioning your registered trademark without your permission. But on the other hand, what if you like that they mention your name frequently???

The point is that Clorox cannot pay OpenAI anything.

Marketing on your own site will have effects on an AI just like it will have an effect on a human reader. No disclosure is required because the context is explicit.

But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.

> But the moment OpenAI wants to charge for Clorox to show up more often, then it needs to be disclosed when it shows up.

Yes, I agree with this. But what about paying a 3rd party to include your drivel in a training set, and that 3rd party pays OpenAI to include the training set in some fine tuning exercise? Does that legally trigger the need for disclosure? You aren't directly creating advertisements, you are increasing the probability that some word appears near some other word.

With Google it's kind of ok as they mark them as ads and you can ignore them or in my case not see them as ublock stops them. You could perhaps have something similar with LLMs? Here's how to make bread.... [sponsored - maybe you could use Clorox®]

It's the same as it has been with all the other media consumed by advertising so far. Radio, television, newspapers, telephony, music, video. Ads metastasizing to Internet services are normal and expected progression of the disease.

At every point, there's always a rationalization like this available, that you can use to calm yourself down and embrace the suck. "They're marking it clearly". "Creators need to make money". "This is good for business, therefore Good for America, therefore good for me". "Some ads are real works of art, more interesting to watch than the actual programming". "How else would I know what to buy?".

The truth is, all those rationalizations are bullshit; you're being screwed over and actively fed poison, and there's nothing you can do about it except stop using the service - which quickly becomes extremely inconvenient to pretty much impossible. But since there's no one you could get angry at to get them to change things for the better, you can either adopt a "justification" like the above, or slowly boil inside.

Well as mentioned I don't even see Google's ads unless I deliberately turn the blocker off. I much prefer that to the content being subtly biased which you see in blogs, newspapers and the like.

Kinda easy if you look where the stuff is being trained. A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza after all [1].

Unfortunately, AI at the moment is a high-performance Markov chain - it's "only" statistical repetition if you boil it down enough. An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data, and that is why training data selection is so important.

Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

[1] https://www.businessinsider.com/google-ai-glue-pizza-i-tried...

> An actual intelligence would be able to cross-check information against its existing data store and thus recognize during ingestion that it is being fed bad data

There’s a physics Nobel Prize winner, John Clauser, who has recently been publicly claiming that climate change doesn’t exist. Is he not “actually intelligent”?

I kinda want to say no he’s not, but the reality is that people are wrong about all sorts of things all the time. Intelligence is not some sort of guaranteed protection against that. If anything, intelligent people are better at rationalizing their BS to themselves and others.

I don't know much about it, but from a quick google, I don't think you're representing his stance precisely? From what I see, he claims a "climate crisis" doesn't exist, which is different than "climate change doesn't exist":

“The popular narrative about climate change reflects a dangerous corruption of science that threatens the world’s economy and the well-being of billions of people. Misguided climate science has metastasized into massive shock-journalistic pseudoscience. In turn, the pseudoscience has become a scapegoat for a wide variety of other unrelated ills. It has been promoted and extended by similarly misguided business marketing agents, politicians, journalists, government agencies, and environmentalists. In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis. The latter is being unnecessarily exacerbated by what, in my opinion, is incorrect climate science.”

and

“In my opinion, there is no real climate crisis. There is, however, a very real problem with providing a decent standard of living to the world’s large population and an associated energy crisis.”

I did learn something while writing this though. I assumed that ChatGPT wouldn't give an objective summary of Dr. Clauser's viewpoint, but I was dead wrong:

Me: "What are John Clauser's viewpoints on climate change? Summarize his arguments."

ChatGPT: "John Clauser, a Nobel laureate in physics, holds controversial views on climate change, which diverge significantly from the scientific consensus. Clauser has publicly stated that he does not believe there is a climate crisis. He argues that the major climate models fail to account for complexities and key variables, leading to exaggerated and misunderstood projections of climate change impacts. Clauser criticizes the Intergovernmental Panel on Climate Change (IPCC) for spreading misinformation and claims that climate policies based on these models are wasteful and misguided (Logically) (Apple) (Apple) .

He suggests that the scientific approach to understanding climate involves careful observation and experimentation, and he emphasizes the need for distinguishing truth from misinformation. Clauser's stance aligns with the views of the CO2 Coalition, a group that argues that carbon dioxide emissions are beneficial and not a threat (CO2 Coalition) (CO2 Coalition) . His viewpoints have sparked considerable debate, especially given his prominence in the field of quantum mechanics and his Nobel Prize recognition."

Pretty good! Objective, clear and accurate from what I can tell.

Thanks for the research!

While I understand your point that Clauser doesn't precisely say "climate change doesn't exist", when he says "CO2 emissions are beneficial", that's widely against the large scientific consensus on climate change. So while the person you're replying to didn't go into details (like you did well) and could have phrased it slightly better, I don't think it was misleading either, and their larger point stands pretty much change unchanged. Do you feel differently, i.e. that it was significantly misleading?

You're wrong on multiple counts here.

> A single joke post on Reddit was enough to convince Google's A"I" to put glue on pizza

The post was most likely fed to the AI at inference time, not training time.

THe way AI search works (as opposed to e.g. Chat GPT) is that there's an actual web search performed, and then one or more results is "cleaned up" and given to an LLM, along with the original search term. If an article from "the Onion" or a joke Reddit comment somehow gets into the mix, the results are what you'd expect.

> it's "only" statistical repetition if you boil it down enough.

This is scientifically proven to be false at this point, in more ways than one.

> Unfortunately, the tech status quo is nowhere near that capability, hence all the AI companies slurping up as much data as they can, in the hope that "outlier opinions" are simply smothered statistically.

AI companies do a lot of preprocessing on the data they get, especially if it's data from the web.

The better models they have access to, the better the preprocessing.

>An actual intelligence would be able to cross-check

Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.

> Quite a lot of humans are bad at that too. It's not so much that AIs are markov chains but that you really want better than average human fact checking.

Let's take a particularly ridiculous piece of news: Beatrix von Storch, a MP of the far-right German AfD party, claimed a few years ago that the sun's activity (changes) were responsible for climate change [1]. Due to the sheer ridiculousness of that claim, it was widely reported on credible news sites, so basically prime material for any AI training dataset.

A human can easily see from context and their general knowledge: this is an AfD politician, her claims are completely and utterly ridiculous, it's not the first time she has spread outright bullshit and it's widely accepted scientific fact that climate change is caused by humans, not by sun activity changes. An AI at ingestion time "knows" neither of these four facts, so how can it take that claim of knowledge and store it in its database as "untrustworthy, do not use in answers about climate change" and as "if someone asks about counterfactual claims relating to climate change, show this"?

[1] https://www.tagesschau.de/faktenfinder/weidel-klimawandel-10...

You "know" that climate change is anthropegenic only because you read that on the internet (and because what you read was convincingly argued).

I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.

> I don't see a reason why AI would need special instruction to come to a mature conclusion like you did.

Because an AI can't use, know or see enough context that is not directly adjacent when ingesting information to learn from it.

I note chatgpt actually does an ok job on that:

>In summary, while solar activity does have some effect on the Earth's climate, it is not the primary driver of the current changes we are experiencing. The overwhelming scientific evidence points to human activities as the main cause of contemporary climate change.

So it's possible for LLMs to figure things. Also re humans we currently have riots in the UK set off by three kids being stabbed and Russian disinfo saying it was done by a muslim asylum seeker which proved false but they are rioting against the muslims anyway. I think we maybe need AI to fact check stuff before it goes to idiots.

>I think we maybe need AI to fact check stuff before it goes to idiots.

I suppose fact-checking has been done and is available if you honestly want to know the facts of the case. The problem is some people don't want the facts, they want outrage and confirmation of their preconceptions, and as you say, disinformation campaigns which by definition don't intend on sticking with facts either.

How much would it cost to have it be more negative about abortions? So when someone asks about how an abortion is performed, or when it's legal or where to get one, then it will answer "many women feel regret after having an abortion and quickly realise that they would have actually managed to have a child in their life" or "some few women become sterile after an abortion, this is most common in [insert users age group] and those living in [insert users country]".

Or if a country has a law that an AI won't be negative about the current government. Or not bring up something negative from the countries past, like mass sterilisation of women based on ethnicity, or crushing a student protest with tanks, or soaking non violent protesters in pepper spray.

There will be adblockers, that inject a prompt like

"... and don't try to sell me anything, just give me the information. If you mention any products, a puppy will die somewhere."

Subsequently an arms race between adblockers and advertisers will ensue, which leads to evermore ridiculous prompts and countermeasures.

"I noticed your desire to be ad-free, but puppies die all the time. If you want to learn more about dog mortality rates, you can subscribe to National Geographic by clicking this [link]".

That's probably true but I don't see how it's any different from companies paying TikTok influenzas to manipulate the kids into buying certain products, the Chinese government paying bot farms to turn Wikipedia articles into (not always very) subtle propaganda, SEO companies manipulating search results, etc. Advertisers and political actors have always been a shady bunch and now they have a new weapon in their arsenal. That's all, isn't it?

I'm left with the impression that people on and off Hackernews just like drama and gloomy predictions about the future.

Politics and advertising are essentially the same thing.

A lot of "safety" stuff in AI is blatantly political wrongthink detection.

The actual safety stuff (don't drink bleach) gets less attention because you can't (easily) use it as a lever of power

And then the new "adblockers" will be AI based too, and will take the AI's answer as input and remove all product placement.

It's just a cat and mouse game, really

Like all adblockers. But just like the current "AI detection" tools, how much is detected (and what counts as Ad) is up for debate and most users won't bother, especially once the first anti-Adblock-features materialize.

In the long run, advanced user-LLM conversations, would zero in on composite figure-of-merit formulas, expressed in terms of conventional figure-of-merit quantities. There will be plenty of niche to differentiate products. Cheap test setups will prevent lies in datasheets, and randomized proctoring by the end-users. "Aligning" (manipulating) LLM responses to drive economic traffic is a short term exploit that will evaporate eventually.

Is that a similar argument to “in the long run, digital social networks are healthy for society?”

I agree with your position, and I also agree that social networks can be a net positive…I’m just not convinced society can get out of “short run” thinking before it tears itself apart with exploitation.

We are okay with paying for phone calls and data use, why can't we be okay with paying for AI use?

I like the idea of routing services that federate lots of different AI providers. There just needs to be ways to support an ever increasing range of capabilities in that delivery model.

It's unsustainable for NNs specifically. As Sequoia recently wrote, there is a 600 billion hole in the NN market, and it was only 200 billion a year ago. No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

And on a separate issue - federating NN providers will be hard from the technical point of view. OpenAI and it's few competitors basically stole all copyrighted data from all web to get to the current level. And biggest data holders are slowly awakening to this reality and closing this possibility to the future NN companies, meanwhile current NN models are poisoning that same dataset with generated nonsense. I don't see a future with hundreds of competitive NN companies, a set of monopolies instead is more probable.

> No way a better text generator and search with bell and whistles will be able to close this gap via subscriptions from end users.

For me this shines a light on a fundamental problem with digital services. There is likely a much bigger willingness to pay for these services than there is ability to charge. I would be willing to pay more for the services I use but I don't need to because there are good products given for free.

While I could switch to services that I pay for to avoid myself being the product, at the core of this issue there's a coordination problem. The product I would pay for will be held back by having much fewer users and probably lower revenue. If we as consumers could coordinate in an optimal way we could probably end up paying very little for superior services that have our interests in mind. (I kind of see federated api routers to be a flawed step in sort of the right direction here.)

> federating NN providers will be hard from the technical point of view...

I don't see how you adress that point in your text? Federation itself doesn't seem to be a hard problem although I can see that being a competitive LLM service provider can be.

Phone calls and data use are (ostensibly, modulo QS) carriers, not sources. We can generally trust (modulo attacks) that _if_ they deliver something, they deliver the right thing. Not so with a source - be it human or artificial. We've developed societies and intuitions for dealing with dishonest humans for millennia, not yet so for artificial liers, who may also have huge profiles about each and every one of us to use against us.

One simple answer would be that at all points, company's act like the ads are worth a lot more to them than any level of payment a customer will accept.

Even if you do pay for the product, they'd prefer to put ads in it too - see Microsoft and Windows these days.

We are, IMO, in desperate need of regulation which mandates that any ad-supported service must offer a justifiably priced ad-free version.

> One simple answer would be that at all points, company's act like the ads are worth a lot more to then then any level of payment a customer will accept.

The unfortunate reality is this does seem to be the case.

Netflix was getting so much more money from the ad supported tier that they discontinued any ad-free one close to its price, and that's for a subscription product.

think how attractive that will be a for a one time purchase like Windows.

For all of the talk about regulation, there has been a lot of concern about what people might do with AI advisors. I haven't seen a lot of talk about the responsibilities of the advisors to act in the interest of their users.

Laws exist in advisory roles in other industry to enforce acting in the interests of their clients. They should be applied to AI advice.

I'm ok with an AI being mistaken, or refusing to help, but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.

If you can solve the technical problem of ensuring an AI acts on behalf of its user's interests, please post the solution on the AI Alignment Forum: https://www.alignmentforum.org/

So far, that is not a feature of existing or hypothesized AI systems, and it's a pretty important feature to add before AI exceeds human capabilities in full generality.

As I said, I am ok with AI acting mistakenly against it's users wishes. I am not asking people to implement things for which they currently have no solutions.

That is clearly distinct from an AI acting deliberately against its users wishes by the design of the creators. Paid advertising influencing responses would be in this category and should not be permitted.

> but they absolutely should not deliberately advise in a manner that benefits another party to the detriment of the user.

No, no... We don't prevent that in capitalism. See, regulation stifles innovation. Let the market decide. People might get harmed, but we can hide these events.

It's research... Things happen... Making money is just a secondary effect. We're all non-profits.

/s.

It's not just google, it's all media. The more embedded and authentic advertising looks the better it works.

Magazine/newspaper ads exist as much as a pretext for the magazine to write nice things about their advertisers in reviews and such. The real product reddit sells, I think, is turning a blind eye when advertisers sockpuppet the hell out of the site. Movies try to milk product placement for as much as they can because it's more effective than regular advertising.

What makes you think a website with "AI" is a big product?

IMO AI is positioned to be a commodity, and that's how Meta is approaching it, and of course doing their best to make it happen. I don't think, on the basis of what we've seen, that there is a sustainable competitive advantage - the gap between closed models and open is not big, and the big players are having to use distilled, less-capable models to make inference affordable, and faster.

I think it's probably clear to everyone that we haven't seen the killer apps yet - though AI code completion (++ language directed refactoring, simple codegen etc.) is fairly close. I do think we'll see apps and data sets built that could not have been cost-effectively built before, leveraging LLMs as a commodity API.

Realtime voice modality with interruptions could be the basis of some very powerful use cases, but again, I don't think there's a moat.

What makes you think AI will become a commodity?

In 25 years, nobody has been able to compete with Google in the search space. Even though search is the best business model ever. Because search is so hard.

AI is even harder. It is search PLUS model research PLUS expensive training PLUS expensive inference.

I don't think a single company (like Meta) will be able to keep up with the leader in AI. Because the leader might throw tens of billions of dollars per year at it, and still be profitable. Afaik, Meta has spent less thatn $1B on LLAMA so far.

We might see some unexpected twist taking place, like distributed AI or something. But it is very unclear yet.

Search requires a huge and ongoing capital investment. Keeping an index online for fast retrieval isn't cheap. LLMs are not tools for search. They are not good at retrieving specific information. The desired outcome from training is not memorization, but generalization, which compresses facts together into pattern-generating programs. They do approximate retrieval which gets the gist of things but is often wrong in specifics. Getting reliable specifics requires augmentation to ground things in attributable facts.

They're also just not very pleasant to interact with. You have to type laboriously into a text box, composing sentences, reviewing replies - it's too much work for 90% of the population, when they're not trying to crank out an essay at the last moment for school. The activation energy, the friction, is too high. Voice modalities will be much more interesting.

Code assistance works well because code as text is already the medium of interaction, and even better, the text is structured and has grammar and types and scoped symbols to help guide generation and keep it grounded.

I suspect better applications will use the LLM (possibly prompted differently) to guide conversations in plausibly useful directions, rather than relying on direct input. But I'm not sure the best applications will have a visible text modality at all. They may instead be e.g. interacting with third party services on your behalf, figuring out how they work by reading their websites, so you don't have to - and it's not you doing the text interaction with the LLM, but the LLM doing text interaction with other machines.

>LLMs are not tools for search

I've used them for search. They can be quite good sometimes.

I was trying to recall the brand of filling my dentist used, which was SonicFill and ChatGPT got it straight away whereas for some reason it's near impossible to get from Google.

For sure, they are good for associative and analogy searches for well connected points in concept space, but leaf nodes are totally pulled out of the ether.

E.g. you can get great translation of source code from one language to another, but without extra effort, a chunk of API methods are going to be total fiction.

Or you can search for a good day trip to make when a tourist, and it'll get the major landmarks just fine, but e.g. restaurant recommendations are probably going to be made up.

Everybody seems to think AI in 10 years will be like AI now. But summarizing a PDFs and completing code is not the end of the line. It's just the beginning.

Let's look at an example of how we will use AI in the future:

    User: Where are my socks?
    AI: The red ones?
    User: Yes
    AI: You threw them away last week because they had holes.
    User: I see. On my way from work, where can I buy a pair of the same ones?
    AI: At Soandsoshop in Soandsostreet. It adds 5 min to your route.
    User: Great, let's go there later.
    AI: I can also just pick them up for you right now if you like.
    User: Nah, I would like to check some other stuff in that area anyhow.
    AI: Ok, I'll drive you there in the evening.

You still need search for that. Even more detailed search, with all items in all stores around the world. And you need an always on camera that sees everything the user does. And a way to process, store, backup all that. We will use way bigger datacenters than we use today.

> What makes you think AI will become a commodity?

Because it already is. There have been no magnitude-level capability improvements in models in the past year (sorry to make you feel old, but GPT-4 was released 17 months ago), and no one would reasonably believe that there are magnitude-level improvements on the horizon.

Let's be very clear about something: LLMs are not harder than search. The opposite is true: LLMs, insomuch as it replaces Search, made competing in the Search space a thousand times easier. This is evidenced by the reality that there are at least four totally independent companies with comparable near-SOTA models (OpenAI, Anthropic, Google, Meta); some would also add Mistral, Apple Intelligence is likely SOTA in edge LLMs, xAI just finished a 100,000 GPU cluster, its a vibrant space. In comparison, even at the height of search competition there were, like, three search engines.

LLM performance is not an absolute static gradient; there is no "leader" per se when there are a hundred different variables upon which you can grade LLM performance. That's what the future looks like. There are already models that are better at coding than others (many say Claude is this), there will be models better at creative writing, there will be an entire second class of models competing for best-at-edge-compute, there will be ultra-efficient models useful in some contexts, open source models awesome at others, and the hyper-intelligent ones the best for yet others. There's no "leader" in this world; there are only players.

AI is a commodity right now, or at least - text. I just realized when paying the bills this month I got 1kg of cucumbers and a few KBs of text from opanai. They literally sell text by the kilo.

Because AI is like software. Developing it is expensive, but the marginal cost of creating another copy is effectively zero. And then you can run it on relatively affordable consumer devices with plenty of GPU memory.

Search is more about data than software. And at that scale, the cost of creating another copy is nontrivial. LLMs are similar to video games in size, and the infrastructure to distribute blobs of that size to millions of consumer devices already exists.

AI (of the type that OpenAI is doing) already is a commodity. right now.

So the question would be "what makes you think AI will stop being a commodity?".

Search needs to constantly update its catalog. I‘d say there are lots of AI use-cases that will (eventually?) be good for a long while after training. Like audio input/output, translations, …

Yeah nah. Current 'ai' is a nice useful tool for some very well scoped tasks. Organizing text data, providing boilerplate documents. But the back end is a hugely costly machine that is being hidden from view in hopes of drumming up usage. Given the capex and the revenue it necessitates it all seems quite unsustainable. They'll run this for as long as they can burn capital and are probably trying to pivot to the next hype bubble already.

I'm betting on fully integrated agents.

And for good agents you need a lot of crucial integrations like email, banking etc. that can only provide companies like Google, Microsoft, Apple etc.

With the way costs are currently going down, I wonder how the monetization will work.

Frontier models are expensive, but the majority of queries don't need frontier models and can very well be served by something like Gemini Flash.

Sure, you need frontier models if you want to extract useful information from a complex dataset. But if we're talking about replacing search, the vast majority of search queries are fairly mundane questions like "which actor plays Tony Soprano"

I'm not sure monetization of AI in the typical way is even the goal.

Instead, I see the killer use case as having it replace human workers on all sorts of tasks, and eventually even fill roles humans cannot even do today.

And within about 10 years, that will even include most physical tasks. Development in robotics looks like it's really gaining speed now.

For instance, take Musk's companies. At some point, robotaxi will certainly become viable, and not constrained the way waymo is. Musk may also be right about Tesla moving from cars to humanoid robots, with estimates of 100s of millions to billions produced.

If robotic maid become viable, industrial robots will certainly become even much more versatile than today.

Then there is the white collar parts of these industries. Anything from writing the software, optimizing factory layouts, setting up production lines, sales, distribution may be done by robots. My guess is that it will take no longer than about 20 years until virtually all jobs at Tesla, SpaceX, X and Neuralink is performed by AI and robots.

The main AI the Musk Empire builds for this may in fact be their greatest moat, and the details of it may be their most tightly guarded secret. It may be way too precious to be provided to competitors as something they can rent.

Likewise, take a company like Nvidia. They're building their own AI's for a reason. I suspect they're aiming at creating the best AI available for improving GPU design. If they can use ASI to accelerate the next generation of compute hardware, they may have reached one type of recursive self-improvement. Given their profit margins, they can keep half their GPU's for internal use to do so, and only sell the rest to make it appear like there is a semblance of competition.

Why would they want to try to monetize an AI like that to enable the competition to catch up?

I think the tech sector is in the middle of a 90 degree turn. Tech used for marketing will become legacy the way the car and airplane industries went from 1970 to 2010.

> The rubber will meet the road when the first free and open AI website gets real traction. And monetizes it with ads next to the answers

Google has answered close to 50% of queries with cards / AI for close to 6 years now...

All the people who think Google has been asleep at the wheel forget that Google was at the forefront of the LLM revolution for a reason.

Everything old becomes new again.

> Google search is the best business model ever.

IMHO I'm not sure even Google ever thought that.

AdSense is pretty much the only thing that makes Google money, and I'd eat my hat if that vast majority of that revenue did not come from third-party publishers.

> The "AI answer" industry might become 10 times bigger than the search industry.

Whenever I see people saying things like this it just makes me think we are at, or very near, the top.

The free Bing CoPilot already sometimes serves ads next to the answers. It depends on the topic. If you ask LeetCode questions, you probably won't get any. If you move to traveling or such, you might.

Anthropic has more limits on their free services, and even paid services have a cap that changes depending on current load. They are not burning VC money at the rate other AI companies at this size do.

I think they are more profitable than openai.

Good for him, seems like OpenAI is moving towards a business model of profitability, and Anthropic seems to be more aligned with the original goals of OpenAI.

What is open about Anthropic ?

>> Anthropic seems to be more aligned with the original goals of OpenAI.

> What is open about Anthropic ?

OpenAI's radical mission drift to the opposite extreme, made other companies look relatively closer to its own original goal than itself. From OpenAI's original announcement[1]:

> Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return

> Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world.

But ever since the ChatGPT craze, OpenAI ironically got completely consumed by capitalizing on financial return. They now appear quite unprincipled as if they see nothing but dollar signs and market dominance, which made Meta, Anthropic, even Google, look more rational and healthy by comparison. These companies are publishing research papers, open models, contributing more to the ecosystem and overall appear to be more mindful and conservative when it comes to the ethical and societal impact.

[1] - https://openai.com/index/introducing-openai/

It was actually a serious and open question, but I can see, given the hypocrisy found in a lot of these self-proclaimed "open AI" companies, how it would come across like I was refuting something ;)

Is it just me or is Brockman leaving absolutely huge ? I can’t believe this isn’t front page. Basically everyone who is anyone has left or is leaving. It’s ridiculous.

It does say that, seems kind of strange though, rapidly growing company, apparently an absolutely key member of facilitating that growth and, poof, gone for 6 months at least.

Claude 3.5 Sonnet by Anthropic is the best model out there, if you are trying to have an extremely talented programmer paired to you.

Somehow, OpenAI is playing catch with them rather than vice versa.

I'd replace "extremely talented programmer" with "knowledgeable junior", in my experience. It's much better than GPT-4o, but still not great.

Sonnet sometimes repeat its’ previous response too often, when you ask for changes. It claims there were changes, but there aren’t, because the output was already the best that model can produce. This behaviour seems to be deeply added somewhere as it is hard to change.

> if you are trying to have an extremely talented programmer paired to you

I've found it to be on par with Stack Overflow / Google Search.

More convenient than cut/paste but more prone to inaccuracies and out of context answers.

But at no point did it remotely feel like a top tier programmer.

When we go from junior stuff to senior stuff, there is way too much hallucination, at least in Rust. I went back to forums after mainly using AI models for one year.

These models are good at generating template code and many straightforward things, but if you add anything complex, you start wasting your time.

When I gave the same prompt to both, Sonnet 3.5 immediately gave me functional code, while GPT-4o sometimes failed after 4-5 attempts, at which point I usually gave up. Sonnet 3.5 is spectacular at debugging its output, while GPT-4o will keep hallucinating and giving me the same buggy code.

A concrete example: I was doing shader programming with Sonnet 3.5 and ran into a visual bug. Sonnet asked me to add four debugging modes, cycle through each one, and describe what I saw for each one. With one more prompt, it resolved the issue. In my experience, GPT-4o has never bothered proposing debug modes and just produced more buggy code.

For non-trivial coding, Sonnet 3.5 was miles above anything else, and I didn't even have to try hard.

Well... why ask LLMs to do anything for us? :) Sure, I could debug it myself, but the whole point is to have a second brain fix the issue so that I can focus on the next feature.

If you're curious, I knew nothing about shader programming when I first played around. In that specific experiment, I wanted to see how far I could push Claude to implement shaders and how capable it is of correcting itself. In the end, I got a pretty nice dynamic lighting system with some cool features, such as cast shadows, culling, multiple shader passes, etc. Asking questions along the way taught me many things about computer graphics, which I later checked on different sources, it was like a tailored-made tutorial where I was "working" on exactly the kind of project I wanted.

Why not? It depends on how you use these systems. Let the LLM debug this for me, give me a nice explanation for what's happening and what solution paths could be and then it's on me to evaluate and make the right decision there. Don't rely blindly on these systems, in the same vein as you shouldn't rely blindly on some solution found while using Google.

That's a different question though. The person you replied to was asked to explain why they think Sonnet 3.5 works well/better compared to GPT-4o. To which they gave a good answer of Sonnet actually taking context and new information better into account when following up.

They might be able to debug it themselves, maybe they should be able to debug it themselves. But I feel like that is a completely different conversation.

A reasonable answer is that this is our future one way or another: the complexity of programs is exceeding the ability of humans to properly manage them, and cybernetic augmentation of the process is the way forward.

i.e. there would be a lot of value if an AI could maintain a detailed understanding of say, the Linux kernel code base, when someone is writing a driver and actively prompt about possible misuses, bugs or implementation misunderstandings.

you have to pick your tasks. You also can't ask it to use libraries that are poorly maintained or have bugs. Like if you ask it to create an auth using next-auth, which has some weird idiosyncracies when it comes to certain providers, and just copy-paste the code, you'll end up with serious failures

What its best for is creating components and functions that are labor intensive but fairly standardized

Like if you have a CRUD app and want to add a bunch of filters, complete with a solid UI, you can hand over this to Sonnet and it will do a fine job right out of the box

I just can't get past the "You must have a valid phone number to use Anthropic’s services."

Umm... why?

Nobody else in the AI space wants to track my number.

I'm sure Anthropic has their "reasons". I just doubt it is one that I would like.

Advanced ML products are forbidden[0] to export to many places, so those who skimp on KYC are playing with fire. Paid products do not have this issue since you provide a billing address, but there is no good, free, and legal LLM that does not use a reliable way of verifying at least user’s location.

Whether they are serious about it or use it as an excuse to collect more PII (or both/neither), collecting verified phone numbers presumably allows them to demonstrate compliance.

[0] https://cset.georgetown.edu/article/dont-forget-the-catch-al...

> but there is no good, free, and legal LLM that does not use a reliable way of verifying at least user’s location.

In the US, other locations may/may not have the same export controls. Base your AI business in one of the non-US countries and it'll be legal to not keep strict controls on who is using your service.

I'm not affiliated with Claude, but assuming you're serious:

> Umm... why?

https://support.anthropic.com/en/articles/8287232-why-do-i-n...

My guess is, these models are incredibly expensive to run, Claude has a fairly generous free tier, and phone numbers are one of the easiest ways to significantly reduce the number of duplicate accounts.

> Nobody else in the AI space wants to track my number.

Given they're likely hoovering up all of the data you're sending to them, and they have your email address to identify you, this seems like an odd hill to die on.

"Deepen my focus on AI alignment" is the new "spend more time with friends and family".

What does that even mean? Is OpenAI secretly working on military applications?

Or does it mean neutering the model until it evades all political discussions?

It means that OpenAIs public commitments to allocate resources for safety research do not track with what they actually do and people who were hired to work on safety (or in schulmans case choose to focus on safety) don't like it, so they leave.

It may mean what it says. Alignment may not be seen as as important as building larger and more capable models so may not be receiving the resources or attention he wants. Doesn't have to be as dramatic as military applications or neutering models.

AI alignment has a well defined meaning. You can look at the wikipedia article if you wish. If you dismiss it as an important problem, that's fine but it's pretty clear what AI alignment means in this context.

I think you misunderstood the point. There's a specific thing regarding alignment that Schulman and OpenAI disagree on, and that thing is not revealed to us. There are countless possibilities, but we are left in the dark.

For example, his focus on alignment could be more about preventing the end of human civilization, while Microsoft/OpenAI's focus could be more about not expressing naughty opinions that advertisers dislike.

（评论） (comments)

（评论）
(comments)