（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41295923

该提示似乎要求对以下观察结果进行解释：“LLM 编写 C++ 代码以使用递归计算阶乘。” 以下是所涉及步骤的可能细分： 1. 首先，LLM 收到指令，编写一个 C++ 程序，使用递归计算给定数字的阶乘。 2. 接下来，法学硕士检索有关递归和阶乘的先验知识。递归是指计算机科学中的一种技术，其中函数重复调用自身，执行整个问题的较小实例，直到达到基本情况。数字 n 的阶乘定义为 1 和 n 之间所有整数的乘积。 3. 为了实现所请求的解决方案，法学硕士用 C++ 构造了一个函数，该函数接受表示输入数字的单个参数。在函数内，LLM 定义了一个局部变量来存储阶乘结果，并将其设置为 1，因为 0 的阶乘被定义为 1。然后，LLM 使用递归来减小问题大小，以递减的值调用相同的函数输入数的阶乘，直到达到基本情况 1。在每次迭代中，当前阶乘结果乘以较低递归深度的新计算结果。 4. 遇到基本情况 1 时，LLM 返回累积的阶乘结果，通过嵌套函数调用向上传播计算，直到初始调用者最终存储结果值。以下是 LLM 编写的代码示例： ``走吧无符号长长阶乘（无符号整数n）{ 无符号长长结果 = 1； for(无符号整数 i = 2; i <= n; ++i){ 结果*=我； } 返回结果； } ```` 总之，法学硕士编写了一个 C++ 程序，该程序使用递归来有效地计算给定数字的阶乘。该方法包括定义一个函数、迭代地减小问题规模、乘以中间结果，并在达到基本情况 1 时返回最终结果。

This should really be retitled to “The AI investment bubble is losing hype.” LLMs as they exist today will slowly work their way into new products and use cases. They are an important new capability and one that will change how we do certain tasks.

But as to the hype, we are in a brief pause before the election where no company wants to release anything that would hit the news cycle in a bad way and cause knee-jerk legislation. Are there new architectures and capabilities waiting? Likely some. Sora showed state of the art video generation, OpenAI has demoed an impressive voice mode, and Anthropic has teased that Opus 3.5 will be even more capable. OpenAI also clearly has some gas in the tank as they have focused on releasing small models such as GPT-4o and 4o mini. And many have been musing about agents and methods to improve system 2 like reasoning.

So while there’s a soft moratorium on showing scary new capability there is still evidence of progress being made behind-the-scenes. But what will a state of the art model look like when all of these techniques have been scaled up on brand new exascale data centers?

It might not be AGI, but I think it will at least be enough for the next hype Investment bubble.

We’ll see, but I doubt its because of the election; as another commenter said companies can’t afford to lose that much money by waiting around for months for the right “moment” to release a product. GPT-4o is good, I’ll grant you that, but its fundenmentally the same tech as GPT3.5 and the fundenmental problem, “hallucination,” is not solved, even if there are more capabilities. No matter what, for anything besides code that may or may not be any good, someone has to go through and read the response and make sure its all tight so they don’t fuck up or embarass themselves (and even then, using AI for coding will introduce long term problems since you’ll have to go back through it for debugging anyway). We all, within a month of ChatGPT getting introduced, caught it in a contradiction or just plain error about some topic of specialist knowledge, and realized that its level of expertise on all topics was at that same level. Sam Altman royallly fucked up when he started believing his own bullshit, thinking AGI was on the horizon and all we needed was to waste more resources on compute time and datacenters.

Its done, you can’t make another LLM, all knowledge from here on out is corrupted by them, you can never deliver an epistemic “update.” GPT will become a relic of the 21st century, like a magazine from the 1950s.

even if the datacenter increase was a waste because the potential of LLMs never pan out relative to the massive investment, i think the flood of gpu and compute will definitely enable whatever could benefit from it. The next better AI application will be very thankful that gpu compute is abundant.

GPUs are easily and irreparably broken by overheating, so GPU compute is something that's high maintenance. It won't stick around like a building or a bridge.

I don’t think you’ve used LLMs enough. They are revolutionary, every day. As a coder I’m several times more productive than I was before, especially when trying to learn some new library or language.

They are revolutionary for use cases where hallucinated / wrong / unreliable output is easy and cheap to detect & fix, and where there's enough training data. That's why it fits programming so well - if you get bad code, you just throw it away, or modify it until it works. That's why it work for generic stock images too - if you get bad image, you modify the prompt, generate another one and see if it's better.

But many jobs are not like that. Imagine an AI nurse giving bad health advice on phone. Somebody might die. Or AI salesman making promises that are against company policy? Company is likely to be held legally liable, and may lose significant money.

Due to legal reasons, my company couldn't enable full LLM generative capabilities on chatbot we use, because we would be legally responsible for anything it generates. Instead, LLM is simply used to determine which of the pre-determined answers may fit the query the best, which it indeed does well when more traditional technologies fail. But that's not revolutionary, just an improvement. I suspect there are many barriers like that, which hinder its usage in many fields, even if it could work most of the time.

So, nearly all use cases I can think of now will still require a human in the loop, simply because of the unreliability. That way it can be a productivity booster, but not a replacement.

> But many jobs are not like that. Imagine an AI nurse giving bad health advice on phone. Somebody might die.

This problem is not unique to AI and you see this problem with human medical professionals. Regularly people are misdiagnosed or aren’t diagnosed at all. At least with AI you could compare the results of different models pretty instantly and get confirmation. An AI Dr also wouldn’t miss information on a chart like a human can.

> So, nearly all use cases I can think of now will still require a human in the loop, simply because of the unreliability. That way it can be a productivity booster, but not a replacement.

This is exactly what your parent said, but yet you replied seemingly disagreeing. AI tools are here to stay and they do increase productivity. Be it coding, writing papers, strategizing. Those that continue to think of AI as not useful will be left behind.

To me those usecases are already revolutionary. And human in the loop doesn't mean it is not revolutionary. I see it multiplying human productivity rather than as immediate replacement. And it can take some time before it is properly iterated and integrated everywhere in a seamless manner.

If you adjust your standard to the level of human performance in most roles, including nursing, you’ll find that AI is reasonably similar to most people in that in makes errors, sometimes convincing ones, and that recovering from those errors is something all social/org systems must do & don’t always get right.

Human in the loop can add reliability, but the most common use cases I’m seeing with AI are helping people see the errors they are making/their lack of sufficient effort to solve the problem.

Developer productivity doesn't map very directly to compensation. If one engineer is 10x as productive as another, they're lucky if they get 2x the compensation.

A good programmer isn't necessarily also good at business to run his own company.

Sure, you have your John Carmaks' or Antirezs' of the industry who are 10x programmers and also successful founders but those guys are 1 in a million.

But your usual 10x engineer you'll meet is the guy who knows the ins and outs of all running systems at work giving him the ability to debug and solve issues 10x quicker than the rest, knowledge which is highly specific to the products of that company and is often non-transferrable and also not useful at entrepreneurship.

Becoming the 10x engineer at a company usually means pigeonholing yourself in the deep inner workings of the products, which may or may not be useful later. If that stack is highly custom or proprietary it might work in your favor making you the 10x guy virtually unfireable being able to set your own demands since only you can solve the issues, or might backfire against you at a round of layoffs as your knowledge of that specific niche has little demand elsewhere.

> But your usual 10x engineer you'll meet is the guy who knows the ins and outs of all running systems at work giving him the ability to debug and solve issues 10x quicker than the rest

You're talking about the 100x engineer now. The 10x engineer is the normal engineer you are probably accustomed to working with. When you encounter a 1x engineer, you will be shocked at how slow and unproductive they are.

We'd all be millionaires if AI could actually help with that, but then if everyone is a millionaire then nobody is.

Current AI is still at the state of recommending people jump off the golden gate bridge if they feel sad or telling them to change their blinker fluid.

300% is a massive underestimate for someone who is AI native and understands how to prompt and interpret results correctly. In the past I would spend 30-40 minutes hunting around on StackOverflow to get the syntax of a difficult database query or bust out a long regular expression.

With AI I can do the same in 15 seconds. We’re talking a 120x increase in productivity not a 3x improvement.

Agreed, I don’t know why people are so down on ai as a coding assistant here. I concur with everything you said, and will add that now I also have a service that will produce, on-demand, the exact documentation that I need at the exact level of abstraction, and then I can interrogate it at will. These things are huge time savers.

You can easily get 300% productivity improvements if you're using a completely new language but still have enough programming background in order to refine the prompts to get what you want, or if you're debugging an issue and the LLM points you in the right way saving you hours of googling and slamming your head against the wall.

No but I can work 2-3 hours a day (WFH) while delivering results that my boss is very happy with. I would prefer to be paid 3 times as much and working 8 hours a day, but I'm ok with this too.

Like with all productivity gains in history, this won't last too long once management realizes this and squeezes deadlines by 2-3x since it will be expected for everyone to use LLMs at work to get things done 2-3 faster than before.

Additionally, there's a limit to how much "productivity gains" on the tech side can cause actual positive business results. Going from an ugly, slow website to a fresh one will only increase revenue a small percentage, same with building a better shopping cart, fixing some obscure errors, etc. The majority of the web already works pretty well, there is no flood of new revenue coming in as a result of the majority of tech innovation.

Which means that after a brief honeymoon period, the effect of AI will be to heavily reduce labor costs, rather than push for turbocharged "productivity" with greatly diminishing returns.

> I don’t think you’ve used LLMs enough. They are revolutionary, every day.

How much is "enough"? Neither myself nor my coworkers have found LLMs to be all that useful in our work. Almost everybody has stopped bothering with them these days.

Do you perhaps have some resources on how you use AI assistants for coding (I'm assuming Github Copilot). I've been trying it for the past months, and frankly, it's barely helping me at all. 95% of the time the suggestions are just noise. Maybe as a fast typer it's less useful, I just wonder why my experience is so different than what others are saying. So maybe it's because I'm not using it right?

Take a look at aider-chat or zed. zed just released new AI features. Had a blog post about it yesterday I think.

Also you can look into cursor.

There are actually quite a few tools.

I have my own agent framework in progress which has many plugins with different commands. Including reading directories, tree, read and write files, run commands, read spreadsheets. So I can tell it to read all the Python in a module directory, run a test script and compare the output to a spreadsheet tab. Then ask it to come up with ideas for making the Python code match the spreadsheet better, and have it update the code and rerun the tests iteratively until its satisfied.

If I am honest about that particular process last night, I am going to have to go over the spreadsheet to some degree manually today, because neither gpt-4o nor Claude 3.5 Sonnet was able to get the numbers to match exactly.

It's a somewhat complicated spreadsheet which I don't know anything about the domain and am just grudgingly learning. I think the agent got me 95% of the way through the task.

I rely on LLMs extensively for my work, but only a part of that is with copilots.

I have copilot suggestions bound to an easy hotkey to turn them on or off. If I’m writing code that’s entirely new to the code base, I toggle the suggestions off, they’ll be mostly useless. If I’m following a well established pattern, even if it’s a complicated one, I turn them on, they’ll be mostly good. When writing tests in c#, I reflexively give the test a good name and write a tiny bit of the setup, then copilot will usually be pretty good about the rest. I toggle it multiple times an hour, it’s about knowing when it’ll be good, and when not.

Beyond that, I get more value from interacting with the llm by chat. It’s important to have preconfigured personas, and it took me a good 500 words and some trial and error to set those up and get their interaction styles where I need them to be. There’s the “.net runtime expert” the “infrastructure and release mentor”, and on like that. As soon as I feel the least bit stuck or unsure I consult with one of them, possibly in voice mode while going for a little walk. It’s like having the right colleague always available to talk something through, and I now rarely find myself spinning my wheels, bike-shedding, or what have you.

I think it's your mindset and how you approach it. E.g. some people are genuinely bad at googling their way to a solution. While some people know exactly how to manipulate the google search due to years of experience debugging problems. Some people will be really good at squeezing out the right output from ChatGPT/Copilot and utilize it to maximum potential, while others simply won't make the connection.

Its output depends on your input.

E.g. say you have an API swagger documentation and you want to generate a Typescript type definition using that data, you just copy paste the docs into a comment above the type, and copilot auto fills your Typescript type definition even adding ? for properties which are not required.

If you define clearly the goal of a function in a JSDoc comment, you can implement very complex functions. E.g. you define it in steps, and in the function line out each step. This also helps your own thinking. With GPT 4o you can even draw diagrams in e.g. excalidraw or take screenshots of the issues in your UI to complement your question relating to that code.

> some people know exactly how to manipulate the google search due to years of experience debugging problems

this really rings true for me. especially as a junior, I always thought one of my best skills was that I was good at Googling. I was able to come up with good queries and find some page that would help. Sometimes, a search would be simple enough that you could just grab a line of code right off the page, but most of the time (especially with StackOverflow) the best approach was to read through a few different sources and pick and choose what was useful to the situation, synthesizing a solution. Depending on how complicated the problem was, that process might have occurred in a single step or in multiple iterations.

So I've found LLMs to be a handy tool for making that process quicker. It's rare that the LLM will write the exact code I need - though of course some queries are simple enough to make that possible. But I can sort of prime the conversation in the right direction and get into a state where I can get useful answers to questions. I don't have any particular knowledge on AI that helps me do that, just a kind of general intuition for how to phrase questions and follow-ups to get output that's helpful.

I still have to be the filter - the LLM is happy to bullshit you - but that's not really a sea change from trying to Google around to figure out a problem. LLMs seem like an overall upgrade to that specific process of engineering to me, and that's a pretty useful tool!

Keep in mind that Google's results are also much worse than they used to be.

I'm using both Kagi & LLM; depending on my need, I'll prefer one or the other.

Maybe I can access the same result with a LLM, but all the conversation/guidance required is time-consuming than just refining a search query and browsing through the first three results.

After all the answer is rarely exactly available somewhere. Reading people's questions/replies will provide a clues to find the actual answer I was looking for.

I have yet been able to achieve this result through a LLM.

> E.g. you define it in steps, and in the function line out each step. This also helps your own thinking

Yeah but there are other ways to think through problems, like asking other people what they think, which you can evaluate based on who they are and what they know. GPT is like getting advice from a cross-section of everyone in the world (and you don’t even know which one), which may be helpful depending on the question and the “people” answering it, but it may also be extroadinarily unhelpful, especially for very specialized tasks (and specialized tasks are where the profit is).

Like most people, I have knowledge of things very specific I know that less than a 100 people in the world know better than me, but thousands or even millions more have some poorly concieved general idea about it.

If you asked GPT to give you an answer to a question it would bias those millions, the statistically greater quantative solution, to the qualitative one. But, maybe, GPT only has a few really good indexes in its training data that it uses for its response, and then its extremely helpful because its like accidentally landing on a stackoverflow response by some crazy genius who reads all day, lives out of a van in the woods, and uses public library computers to answer queries in his spare time. But that’s sheer luck, and no more so than a regular search will get you.

It is very helpful in providing highly specific "boilerplate" in languages/environments you are not very familiar with.

The text interface can also be useful for skipping across complex documentation and/or learning. Example: you can ask GPT-4 to "decode 0xdf 0xf8 0x44 0xd0 (thumb 2 assembly for arm cortex-m)" => this will tell you what instruction is encoded, what it does and even how to cajole your toolchain into providing that same information.

If you are an experienced developer already, with a clear goal and understanding, LLMs tend to be less helpful in my experience (the same way that a mentor you could ask random bullshit would be more useful to a junior than a senior dev)

> this will tell you what instruction is encoded, what it does and even how to cajole your toolchain into providing that same information.

or it will hallucinate something that's completely wrong but you won't notice it

My experience is mostly with gpt-4. Act like it is a beginner programmer. Give it small, self-contained tasks, explain the possible problems, limitation of the environment you are working with, possible hurdles, suggest api functions or language features to use (it really likes to forget there is a specific function that does half of what you need instead of having to staple multiple ones together). Try it for different tasks, you will get a feel what it excels in and what it won't be able to solve. If it doesn't give good answer after 2 or 3 attempts, just write it yourself and move on, giving feedback barely works in my experience.

Copilot is just an autocomplete tool, it doesn’t have much support for multiturn prompting so it’s best used when you know exactly what code you want and just want it done quickly like implementing a well defined function to satisfy an interface or refactoring existing code to match an example you’ve already written out or prefilling boilerplate on a new file. For more complex work you need to use a chat interface where you can actually discuss the proposed changes with the model and edit and fork the conversation if necessary.

What language do you use?

If you can beat copilot in a typing race then you’re probably well within your comfort zone. It works best when working on things that you’re less confident at - typing speed doesn’t matter when you have to stop to think.

I do 120wpm, but still copilot outpaces me, and it is not just typing, it is the little things I don't have to think about. Of course I know how to do all of it, but it still takes some mental energy to come up with algorithms and code. It takes less energy to verify what copilot output at least to me.

I use C# for the most part, sometimes PowerShell. But I can certainly see how it's more useful when I don't know much of the API yet. Then it would be a lot of googling which the AI assistant could avoid.

My experience is similar. Most of the results are not really useful so I have to put work in to fix them. But at that point I can do the small extra step of doing it completely myself.

This comment doesn't deserve the downvotes its getting, the author is right, and I'm having the same experience.

LLM outputs aren't always perfect, but that doesn't stop them from being extremely helpful and massively increasing my productivity.

They help me to get things done with the tech I'm familiar with much faster, get things done with tech I'm unfamiliar with that I wouldn't be able to do before, and they are extremely helpful for learning as well.

Also, I've noticed that using them has made me much more curious. I'm asking so many new questions now, I've had no idea how many things I was casually curious about, but not curious enough to google.

Good luck telling a bunch of programmers that their skills are legitimately under threat. No one wants to hear that. Especially when you are living a top 10% lifestyle on the back of being good at communicating to computers.

There is an old documentary of the final days of typesetters for newspapers. These were the (very skilled) people who rapidly put each individual carved steel character block into the printing frame in order print thousands of page copies. Many were incredulous that a machine could ever replicate their work.

I don't think programmers are going to go away, but I do think those juicy salaries and compensation packages will.

The requirement for programmer’s is absolutely going to decline.

Some will undoubtably transition to broader based business consultancy services. For those unable or unwilling to do so the future is bleak.

At least for now it seems more like a multiplier that wouldn't reduce the amount of work out there, possibly even increase demand in certain cases as digitisation becomes easier so projects that weren't worth to do before will be now and more complicated usecases will open up as well.

So same programmer with the same 8h of workday will be able to output more value.

> I do think those juicy salaries and compensation packages will.

I think that's inevitable with or without LLMs in the mix. I also think the industry as a whole will be better for it.

ChatGPT says

The documentary he's referring to is likely "Farewell, Etaoin Shrdlu," released in 1980. It chronicles the last day of hot metal typesetting at The New York Times before they transitioned to newer technology. The title comes from the nonsense phrase "etaoin shrdlu," which appeared frequently in Linotype machine errors due to the way the keys were arranged. The documentary provides a fascinating look at the end of an era in newspaper production.

> But as to the hype, we are in a brief pause before the election where no company wants to release anything that would hit the news cycle in a bad way and cause knee-jerk legislation.

The knee-jerk legislation has mostly been caused by Altman's statements though. So I wouldn't call it knee-jerk, but an attempt by OpenAI to get a legally granted monopoly.

They are definitely working on that, but it needs to be the right kind of knee-jerk legislation, something that gives them regulatory capture. They can’t afford to lose control of the narrative and end up regulated to death.

"no no, it's not that the technology has reached it's current limits and these companies are bleeding money, they're just withholding their newest releases not to spook the normies!"

Here's an experiment you can try, go to https://www.udio.com/home and grab a free account which comes with more than enough credits to do this. Use a free chat LLM like Claude 3.5 Sonnet or ChatGPT 4o to workshop some lyrics that you like, just try a few generations and ask it to rewrite parts you don't like until you have something that you don't find too cringe. Then go back over to Udio, go to the create tab turn on the Manual Mode toggle and type in only 3 or 4 comma separated tags that describe the genre you like keep them very basic like Progressive Rock, Hip Hop, 1995, Male Vocalist or whatever you don't need to combine genres these are just examples of tags. Then under the Lyrics section choose Custom and paste in just the chorus or a single verse from the lyrics you generated and then click Create. It'll create two samples for you to listen to, if you don't like either of them then just click Create again to get another two but normally it doesn't take too many tries to get something that sounds pretty good. After you have one you like then click on the ... menu next to the song title and click Extend, you can add sections before or after and you just have to add the corresponding verse from the lyrics you generated or choose Instrumental if you want a guitar solo or something. You'll wind up with something pretty good if you really listen to each sample and choose the best one.

Music generation is one of the easiest ways to "spook the normies" since most people are completely unaware of the current SOTA. Anyone with a good ear and access to these tools can create a listenable song that sounds like it's been professionally produced. Anyone with a good ear and competence with a DAW and these tools can produce a high quality song. Someone who is already a professional can create incredible results in a fraction of the time it would normally take with zero budget.

One of the main limitations of generative AI at the moment is the interface, Udio's could certainly be improved but I think they have something good here with the extend feature allowing you to steer the creation. Developing the key UI features that allow you to control the inputs to generative models is an area where huge advancements can be made that can dramatically improve the quality of the generated output. We've only just scratched the surface here and even if the technology has reached its current limits, which I strongly believe it hasn't since there are a lot of things that have been shown to work but haven't been productized yet, we could still see steady month over month improvements based on better tooling built around them alone.

Text generation has gone from markov chain babblers to indistinguishable from human written.

Image generation has gone from acid trip uncanny valley to photorealistic.

Audio generation has gone from 1930's AM radio quality to crystal clear.

Video generation is currently in fugue dream state but is rapidly improving.

3D is early stages.

???? is next but I'm guessing it'll be things like CAD STL models, electronic circuits, and other physics based modelling outputs.

The ride's not over yet.

I see this pattern a lot, and I find it telling:

- someone claims that Gen AI is overhyped

- someone responds with a Gen AI-enabled service that is

    1) really impressive

    2) is currently offered pretty much for free

    3) doesn't have that many tangible benefits.

There's many technologies for which it's very easy to answer "how does it improve life of an average person": the desktop, the internet, the iPhone. I don't think Udio is anything like these. Long-term, how profitable do you expect a Udio-like application to be? Who would pay money to use this service?

It's just hard to imagine how you can turn this technology into a valuable product. Which isn't to say it's impossible: gen-AI is definitely quite capable and people are learning how to integrate it into products that can turn a profit. But @futureshock's point was that it is the AI investment bubble that's losing hype, and I think that's inevitable: people are realizing there are many issues with a technology that is super impressive but hard to productize.

I've tried Udio when it appeared, and, while it is spectacularly fascinating from the technical perspective, and can even generate songs that "sound" OK, it is still as cringe as cringe can be.

Do you have an example of any song that gained any traction among human audience? Not a Billboard hit, just something that people outside the techbubble accepted as a good song?

Have you tried the latest model? It's night and day.

Edit:

There's obviously still skill involved in creating a good song, it's not like you can just click one button and get a perfect hit. I outlined the simplest process in my first comment and specifically said you could create a "listenable" song, it's not going to be great but it probably rivals some of the slop you often hear on the radio. If you're a skilled music producer you can absolutely create something good especially now with access to the stemmed components of the songs. It's going to be a half manual process where you first generate enough to capture the feeling of the song and then download and make edits or add samples, upload and extend or remix and repeat.

If you're looking for links and don't care to peruse the trending section they have several samples on the announcement page https://www.udio.com/blog/introducing-v1-5

I think the frontpage/recommended on Udio and Suno both have some decent music these days. By decent I mean on the level one could expect from say browsing music on say YouTube in areas one is not familiar with. There is of course a lot of meme/joke content, but also some pleasant/interesting sounding songs.

The really good stuff probably will not be marked as med with AI - and probably will also go via a DAW and proper mastering.

I wrote a song for my girlfriend this way. It turned out pretty nice. A bit quirky is not necessarily a bad thing when making personalized content. Took me a couple of hours to get it to my liking, including learning all the tools and the workflow for the first time. And fixing up a couple of mispronunciations of her nickname using inpainting. Overall a very productive environment, and will probably try to make some more songs - an replace the vocals with my own using the stems feature.

I have some audio engineering skills, dabbled in songwriting, guitar and singng when I was younger, but actually never completed a full song. So it is quite transformative from that perspective!

That's great to hear! It's uses like this that really make Udio and other tools shine. Even just making up silly songs about things that happen in your life is fun or doing them as a gift like you did is always nice. It's also great to have the option to add music to other projects.

The ride to what? The place where human musicians can't earn a living because they can't live on less than what it costs to have an AI regurgitate the melodic patterns, chord progressions, and other music theory it has learned? This benefits who, exactly? It's a race to the bottom. Who is going to pay anything for music that can be generated basically for free? Who is going to go to a concert or festival to listen to a computer? Who is going to buy merchandise? Are the hardware, models, and algorithms used going to capture the public interest like the personalities and abilities of the musicians in a band? Will anyone be a "fan" of this kind of music? Will there be an AI Freddie Mercury, Elton John, Prince, or Taylor Swift?

It sounds like you're arguing with yourself. You provide exactly the reasons why generative AI isn't going to take us to a "place where human musicians can't earn a living". It's my understanding that most small bands primarily earn their money from live performances and merchandise, gen AI isn't going to compete with them there, if anything it'll make it much easier for them to create their own merch or at least the initial designs for it.

AI generated music is more of a threat to the current state of the recording industry. If I can create exactly the album or playlist that I want using AI then why should I pay a record label for a recording that they're going to take 90% of the retail price from? The playlist I listen to while I'm working out or driving is not competing with live band performances, I'm still going to go to a show if there's a band playing that I like.

Yeah I didn't really state that very well. My point was mostly what you say: because people are fans of artists, and because AI music is/will be essentially free to produce, AI music isn't something that will make money for anyone, unless it's the default way anything online makes money: ads are injected into it. I'm not going to pay for it. I'm not going to buy merchandise, or go to concerts, or do anything a music fan does and pays money for. I'm not even going to put it in a workout playlist, because I can just as easily make a playlist of real human artists that I like.

I disagree that it's a threat to the recording industry. They aren't going to be able to sell AI music, but nobody else is either, because anyone who wants AI music can just create it themselves. Record labels will continue to sell and promote real artists, because that's how they can make money. That's what people will pay for.

Fair enough but I'm not sure you're even going to realize if you're listening to AI generated music or not. One way of using these tools is to take lyrics and create several different melodies and vocal styles. An artist or a professional songwriter could do this and then either record their own version or pay musicians to perform it. That could be any combination of simply re-recording the vocal track, replacing or adding instrument tracks, making small modifications to some of the AI generated tracks, etc. The song can then be released under the name of the singer who can also go on tour in the flesh. You might also just come across a 100% AI song on a streaming platform and enjoy it and add it to a playlist. Who vets all of the music they listen to anyways? and if the producer manages the web presence of the "band" and provides a website it would withstand a cursory search. You'd have to look closely to determine that there are no human band members other than the producer. For the types of electronic music that aren't usually performed live and are solely attributed to one artist it might be impossible to tell. The line would be especially blurry there anyways due to the current extensive use of samples and non-AI automation.

There are a lot more fuzzy edges too, you can use AI tools to "autotune" your own voice into a completely different one. You can tap out a quick melody on a keyboard and then extend, embellish and transform it into a full song. You could even do the full song yourself first and then remix it using AI.

The point I agree on would be that one-click hits are going to be few and far between for a while at least. If no effort is put into selecting the best then it's really just random chance. I'd be willing to bet that there will be an indie smash hit song created by a single person who doesn't perform any of the vocals or instruments within a year though. It'll get no play time on anything controlled by the industry titans but people will be streaming it regardless.

Yes, I do think it’s plausible. You’re talking Microsoft and Google. Two companies with extremely close ties to the US government. And Amazon is a major investor in Anthropic. It doesn’t take a conspiracy if the folks with majority stake in these companies are all friends and handshake agree over a steak dinner just off Capitol Hill. We live in a world where price fixing is a regular occurrence, and this is not so different.

I think we will see a very nice boost in capability within 6 months of the election. I don’t personally believe all the apocalyptic AGI predictions, but I do think that AI will continue to feed a nice growth curve in IT investment and productivity growth, similar to the last few decades of IT investment.

Your observation is spot on. LLMs represent a transformative capability, offering new ways to handle tasks that were previously more challenging or resource-intensive.

"there’s a soft moratorium on showing scary new capability"

Yes. There is also the Hype of the "End of the Hype Cycle". There is Hype that the Hype is ending.

When really, there is something amazing being released weekly.

People are so desensitized that just because we don't have androids walking the streets or suddenly have Blade Runner like space colonies staffed with robots, that somehow AI is over.

Nope. I couldn't care less about some elections (I care about global consequences, but no point wasting time & energy now, when it comes it comes). Thats US theater to focus population on that freak show you guys made out of election process, rather than on actually important stuff and concrete policies or actions.

What people, including me, are massively fed up is all the companies (I mean ALL) jumping on AI bandwagon in a beautiful show of how FOMO works and how even CEOs/shareholders are not immune to basic instincts. Literal hammer looking desperately for nails. Very few companies have amazing or potentially amazing products, rest not so much.

I absolutely don't want every effin' thing infused with some AI, since it will be used to 1) monitor my usage or me directly for advertising / credit & insurance scoring purposes, absolutely 0 doubt there; and 2) it may stop working once wifi is down, product is deprecated or company changes its policy (Sonos anyone?). Internet of Things hate v2.0.

I hate this primitive AI fashion wave, negative added value in most cases, 0 in the rest, yet users have to foot the bill. Seeing some minor industry crash due to unfulfilled expectations is just logical in such case.

If anything, i'm getting more hyped up over time. Here are the things i've used LLMs for, with success in all areas as a solo technical founder.

Business Advice including marketing, reaching out to investors, understanding SAFE notes (follow up questions after watching the Y Combinator videos), customer interview design. All of which, as an engineer, I had never done before.

Create SQL queries for all kinds of business metrics including Monthly/Daily Active users, breakdown of users by country, abusive user detection and more.

Automated unit test creation. Not just the happy path either.

Automated data repository creation, based on a one shot example and MySQL text output describing the tables involved. From this, I have super fast data repositories that use raw SQL to get/write data.

Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.

Database and query optimization.

Code Review. This has caught edge case bugs that normal testing did not detect.

I'm going to try out aider + claude sonnet 3.5 on my codebases. I have heard good things about it and some rave reviews on X/twitter. I watched a video where an engineer had a bug, described it to some tool (which wasn't specified, but I suspect aider), then Claude created a test to reproduce the bug and then fixed the code. The test passed, they then did a manual test and the bug was gone.

> Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.

I'm glad this has been working for you -- generally any time I actually have a really difficult problem, ChatGPT just makes up the API I wish existed. Then when I bring it up to ChatGPT, it just apologizes and invents new API.

LLMs aren't good when you drift out of the training distribution. You want to be hitting the meat in the middle and leveraging the LLM to blast through it quickly.

That means LLMs are great for scaffolding, prototypes, the v0.1 of new code especially when it's very ordinary logic but using a language or library you're not 100% up to speed on.

One project I was on recently was translation: converting a JS library into Kotlin. In-editor AI code completion made this really quick: I pasted a snippet of JS for translation in a comment, and the AI completed the Kotlin version. It was frequently not quite right, but it was way faster than without. In particular, when there was repeated blocks of code for different cases that different only slightly, once I got the first block correct, the LLM picked up on the pattern in-context and applied it correctly for the remaining blocks. Even when it's wrong, if it has an opportunity to learn locally, it can do so.

I’ve come to the similar conclusion, and realized that I’m slowly learning how to wield this new tool.

The sweet spot is when you need something and you’re sure it’s possible but just don’t know (or it’s too time consuming) how. E.G. change the css to to X, rewrite this python code in typescript, use the pattern of this code to do Y, etc.

Reminds me of the early days of Google where you had to learn how write a good search query. You learn you need more than a word or two, but don’t write a whole essay, etc.

Has to be something pretty generic. I'm trying to write a little C program that talks to an LCD display via the SPI bus--something I did before a few times, but not with this particular display and MCU. There is no LLM that can even begin to reason this out since they've been mostly trained on web dev content.

is there no documentation about that uc and LCD controller? I'd assume they've been trained on every html, pdf, and video tutorial out there, as well as the pdfs of all of those chip's datasheets, and example microcontroller C code out there in the form of vendor documentation. sure that's maybe less than the amount of html web app tutorials but if we assume it's been fed all of the vendor's documentation about an uc and their library documentation as well, the ones it would fail on are undocumented chips out of china that have no data sheets (which make for a very fun project to reverse engineer, mind you), or something very new. Even without that though, it's still able to dot product (not going to call it "reasoning") its way to at least hallucinate code to talk to an LCD controller chip via SPI for a uc is never even heard of, so I can't agree with "even begin to reason this out".

You don't learn how to program SPI from reading documentation about an LCD controller. You need a lot more context and understand how to string together basic operations, which are quite often not detailed in parts documentation.

I think that you have a serious misunderstanding of the capabilities of LLMs - they cannot reason out relationships among documents that easily. They cannot even tell you what they don't know to finish a given task (and I'm not just talking one-shot here, agent frameworks suffer from the same problem).

you keep making these claims that it's harder than it appears but have yet to back them up. I'd be more than happy to update my understanding of the capabilities of these things if you can actually show me limitations of the technology. until then, just saying SPI is harder than that, when I've written the code for a microcontroller to interface to an LCD, at the advent of Google, (so before stack overflow and ChatGPT), and so I deeply know the frustrations of board bring up, doesn't convince me that it's it's my understanding that's wrong, but yours.

Oh you're that rcarmo? So cool! so help me understand, because that's what we're all here for, curiosity, with a bit more specificity, why SPI to an a LCD controller for this as-of-yet unspecified microcontroller is harder for an LLM than it seems? I buy that it is, but you've yet to give any sort of evidence and appeal to authority is a logic fallacy.

I found that ChatGPT needs to be rained in with the prompts, and then it does a very impressive job. E.g. you can create a function prototype (with input and output expectations) and in the body tell the logic you are thinking about in meta-code. Then tell it to write the actual code. It's also good if you want to immerse yourself into a new programming language and outline what kind of program you want, and expect the results to be different from what you throught, but insightful.

Now if you throw larger context or more obscure interface expectations at it, it'll start to discard code and hallucinate.

Do you provide code examples? In my experience the more specific you get with your problem the more specific are the provided solutions (probably a "natural" occurence in LLMs). Hallucianted APIs are sometimes a problem for me, but then I just specify which API to use.

Why do you need an LLM if you know what you want it to do? Just write the code rather than wrangling with the LLM, it isn't like writing code take much time when you know what it should do.

Not op but my response: Because I am lazy and would like to save the 1-5 minutes it would take me to actually write it. When there are dozens of these small things a day the saved time really adds.

For me, it depends on the problem. I avoid LLMs for anything complex, cause I prefer to think it through myself. But there are often times when you know exactly what you want and you know how it should look like. Let's say you need a simple web API to help you with a task. These days I'd typically ask an LLM to write the app. It will usually get some stuff wrong, but after a quick glance I can steer it to fix the problems (like: you didn't handle errors etc).

That way I can generate a simple few hundred lines of code app in minutes. There is no way I could type that fast even if I exactly know what characters to write and it's not always the case. Like, oftentimes I know exactly what to do and I know if it's OK when I see the code, but writing it would require me to look into the docs here and there.

I find LLM's great for:

- Getting over the blank canvas hurdle, this is great for kick starting a small project and even if the code isn't amazing, it gets my brain to the "start writing code and thinking about algo/data-structures/interesting-problem" rather than being held up at the "Where to begin?" Metaphorically where to place my first stroke, this helps somewhat.

- Sometimes LLM has helped when stuck on issues but this is hit and miss, more specifically it will often show a solution that jogs my brain and gets me there, "oh yeah of course" however I've noticed I'm more in than state when tired and need sleep, so the LLM might let me push a bit longer making up for tired brain. However this is more harmful to be honest without the LLM I go to sleep and then magically like brains do solve 4 hours of issues in 20 minutes after waking up.

So LLM might be helping in ways that actually indicate you should sleep as brain is slooooowwwwing down

Yes, this. I was skeptical and disgusted at a lot of what was being done or promised by using LLMs, but this was because I initially saw a lot of wholesale: "Make thing for me," being hyped or discussed. In practice, I have found them to be good tools for getting going or un-stuck, and use them more like an inspiration engine, or brain kick-starter.

I find LLM's great for:

- Getting over the blank canvas hurdle, this is great for kick starting a small project and even if the code isn't amazing, it gets my brain to the "start writing code and thinking about algo/data-structures/interesting-problem" rather than being held up at the "Where to begin?" Metaphorically where to place my first stroke, this helps somewhat.

- Sometimes LLM has helped when stuck on issues but this is hit and miss, more specifically it will often show a solution that jogs my brain and gets me there, "oh yeah of course" however I've noticed I'm more in than state when tired and need sleep, so the LLM might let me push a bit longer making up for tired brain. However this is more harmful to be honest without the LLM I go to sleep and then magically like brains do solve 4 hours of issues in 20 minutes after waking up.

So LLM might be helping in ways that actually indicate you should sleep as brain is slooooowwwwing down

I know it’s not a completely fair comparison, but to me this question is kind of missing the point. It’s like asking “Why take a cab if you know where you want to go?”

It's such a poor comparison it's ridiculous. A better analogy is "why take a cab if you know where you want to go and provide the car and instructions on how to drive"

It's nice that everybody is trying to help with the way you're prompting but just use Bing Copilot or Phind for this, not ChatGPT

It'll generate a bunch of queries to Google (well, "to Bing" I guess in that case) based on your question, read the results for you, base its answer on the results and provide you with sources that you can check if it used anything from that webpage.

I only use ChatGPT for documentation when I have no idea where I'm going at all, and I need a lay of the land on best practices and the way forward.

For specifics, Bing Copilot. Essentially a true semantic web search

I assume it knows the big stuff like the PyTorch API/major JS and React libs then just paste the docs or even impl code for any libs it needs to know beyond that.

There was a movie that came out in 2001 called "Artificial Intelligence", at a time when we were still figuring out how things like search engines and the online economy were going to work. It had a scene where the main characters went to a city and visited a pay-per-question AI oracle. It was very artistically done, but it really revealed (in hindsight) how naive we were about how "online" was going to turn out.

When I look at the kinds of AI projects I have visibility into, there's a parallel where the public are expecting a centralized, all knowing, general purpose AI, but what it's really going to look like is a graph of oddball AI agents tuned for different optimizations.

One node might be slow and expensive but able to infer intent from a document, but its input is filtered by a fast and cheap one that eliminates uninteresting content, and it could offload work to a domain-specific one that knows everything about URLs, for example. More like the network of small, specialized computers scattered around your car than a central know-it-all computer.

> When I look at the kinds of AI projects I have visibility into, there's a parallel where the public are expecting a centralized, all knowing, general purpose AI

I don't think this is entirely fair to "the public". Media was stuffed with AI company CEOs claiming that AGI was just around the corner. Nvidia, OpenAI and Musk, Zuckerberg, and others were positively starry eyed at how, soon, we'd all be just a GPU matmul away from intelligence. "The public" has seen these eye watering amounts of money shifting around, and they imply that it must mean something.

The entire system has been acting as if GenAI was right around the corner.

maybe there's a term confusion here - GenAI has come to mean Generative AI (LLM's, Diffusion models..) rather than General-AI. People call that AIG, now people also talk about AIS which I take to mean "human level on a narrow domain only" while AIG is "generally intelligent at roughly human level".

My personal belief is that AIS is not a real thing (in the sense I wrote above) because narrow domain competence is tightly coupled to general domain competence . Even very autistic people that are functional in some domain actually have a staggering range of competences that we tend to ignore because we expect them in humans. I think machines will be similar.

Anyway, AIG or AIS is not round the corner at all. But that doesn't mean that there isn't a lot of value to be had from generative AI in the near future or now. Will this be a small fraction of the value from Web1.0 and Web2.0? Will it be approximately the same? Will it be a multiple? I think that's the question. I think it's clear that assistants for software engineers are somewhat valuable now (evidence: I get value out of them) how valuable? Well, more than stackexchange, less than a good editor. That's still alot, for me. I won't pay for it though...

And this points to the killer issue: there isn't a good way to monetize this. There isn't a good way to monetize the web, so we got adverts (a bad way). What will be the equivalent for LLM's? We just don't know right now. Interestingly there seems to be very little focus on this! Instead folks are studying the second order value. Using this "free thing" we can drive productivity... or quality... increase opportunities... create a new business?

I was definitely confusing the terms. I was thinking of AGI, but i remembered that the G was for general, and GenAI "felt" right (probably because it's used in a similar enough context).

Replace all the instances of GenAI with AGI in my post.

It's an interesting observation that the economics aren't there yet. I think it's generally assumed that if we find something valuable, we can probably figure out how to monetize it. That's not necessarily true though. In the same but opposite vein, it doesn't necessarily need to be useful to stick around. It's possible AI is forever going to be useless (in objective terms, maybe it will make people less efficient) but find a monetization strategy that keeps it around (maybe it makes people feel good).

A ton of the technology economy isn't really based on objective metrics of usefulness. Microsoft isn't the biggest because they're the most useful. We don't look to the quality of windows to understand if people will buy the next version. We don't look at the google search results as an indicator of google's profitability.

> The entire system has been acting as if GenAI was right around the corner.

To be clear, I think it is. It's just not going to be a hologram of a wizard in a room you can ask a question to for a quarter, which is what these chat bots and copilots you see today are modeled around.

fwiw, here's my results.

Every question I've asked of chatGPT, meta and Gemini have returned results that were either obvious or wrong. Pointing out how wrong the answers returned got the obvious, "I apologize" response which returned an obvious answer.

I consider all these AI engines to be interactive search engines where the results need to be double checked. The only thing these engines do, for me perhaps, is save some search time so I don't have to click around on a lot of sites to scroll for some semblance of an answer to verify.

Every question?

If it’s returning results that were obvious, why were you asking the question?

And I don’t believe that the other ~50% were wrong.

> The only thing these engines do, for me perhaps, is save some search time so I don't have to click around on a lot of sites to scroll for some semblance of an answer to verify.

This sounds like a valuable service.

IME they basically rephrase the information I've put into it, rarely adding anything I didn't already imply I knew by my formulation of the question

Something to keep in mind is that gambling rules apply: if enough people flip coins, there is always someone experiencing a winning streak and someone experiencing a losing streak and majority that gets a mixed bag of roughly breaking even (mediocre usefulness and a waste of time)

my first week of using GPT4 every day I experienced great answer after great answer, and I was convinced the world would change in a matter of months, that natural language translation was now a solved problem etc etc

But my luck changed, and now I get some good answers and some idiotic answers, so it's mostly not worth my time. Some people never get a good answer in their few dice rolls before writing off the technology

Well, time to run it locally then :) Check out ollama.com. llama 3.1 is pretty crazy, especially if you can run the 405B one. Otherwise, use Mistral/Mixtral or something similar.

> The only thing these engines do, for me perhaps, is save some search time.

This. Saving time(or money if you see them as the same) is the whole point actually.

Intelligence is supposed to be in the context of a shared Goal or beliefs. In your case and case of most humans time and money is the context.

Are ant(insect) networks intelligent? Possibly, They do help millions of them communicate quickly. But ants don't have a brain.

Are beings that make decisions without conscious choices intelligent? Possibly, if they could escape death by amazing ability at any instant. But these beings don't have a frontal cortex that can make decisions by rational inquiry.

Debating this point is a bit like bike shedding and besides the point. The point is they can't think nearly as closely like humans, but a network of them can seemingly do intelligent things. Intelligence is only about shared goals and beliefs with an agent(the other ants in this example) and achieving them.

It's more like an ouija board, it only works if you believe and phrase your input as if you expect a competent answer

I know great programmers who approached it with skepticism and the bot responds be being worthy of skepticism

Most of it is mediocre creativity and a low willingness to adapt their prompting patterns for an optimal ratio of effort to output quality. Most people who don't understand yet expect LLMs to read their minds when they would better of orienting themselves as student who have zero experiencing developing the arsenal strategies that elicit productivity gains.

They haven't developed any intuition into which kinds of questions are worth prompting, which kinds of context are effective, or even which kinds of limitations apply to which models.

AI (specifically Claude Sonnet via Cursor) has completely transformed my workflow. It's changed my job description as a programmer. (And I've been doing this for 13y – no greenhorn!)

This wasn't the case with GPT-4/o. This capability is very new.

When I spoke to a colleague at Microsoft about these changes, they were floored. Microsoft has made themselves synonymous with AI, yet their company is barely even leveraging it. The big cos have put in the biggest investments, but also will be the slowest to change their processes and workflows to realize the shift.

Feels like one of those "future is here, not evenly distributed yet" moments. When a tool like Sonnet is released, it's not like big tech cos are going to transform over night. There's a massive capability overhang that will take some time to work itself through these (now) slow-moving companies.

I assume it was the same with the internet/dot-com crash.

I feel like I'm living in a different universe sometimes. The consensus on HN seems to be that you can be pretty productive with LLMs as coding assistants, but every time I try I find it borderline impossible to get functional code even for pretty straightforward prompts.

I decided to fire up GPT-4o again today to see if maybe things have gotten better over the past few months.

I asked GPT to write code to render a triangle using Vulkan (a 3D graphics API). There are about 1000 tutorials on this that are almost certainly in GPT-4's training data. I gave GPT two small twists so it's not a simple case of copy/paste: I asked it 1) to apply a texture to the triangle and 2) to keep all the code in a single function. (Most tutorials break the code up into about a dozen functions, but each of these functions are called only once, so it should be trivial to inline them.)

Within the first ten lines, the code is already completely nonfunctional:

GPT-4o declares a pointer (VkPhysicalDevice) that is uninitialized. It queries the number of graphics devices on the host machine. A human being would allocate a buffer with that number of elements and store the reference in the pointer. GPT-4o just ignores the result. Completely ignores it. So the function call was just for fun, I guess? It then tries to copy an entire array of VkPhysicalDevice_T objects into this uninitialized pointer. So that's a guaranteed memory access violation right off the bat.

Sometime I think there's something wrong with me. I've used copilot, I'm paying for ChatGPT and we're also having the Jetbrains AI, and there's just something so off about all of this.

Some basic things are fine, but once you get into specialised things, everything gets just terribly wrong and weird. I can't even put it into words, I see people saying how they are 10x more productive (I'd like to see actual numbers and proof for this), but I just don't see how. Maybe we're working on very custom stuff, or very specific things, but all of these tools seem to give very deep or confident answers that are just plain wrong and shallow. Just yesterday I used GPT 4o for some basic help with Puppet, and the examples it printed, even though I would say it's quite basic, were just wrong, but in the sense of having to debug it for 2 hours just to figure out how ridiculous the error was.

I fear the fact that people will end up releasing unsafe, insecure and simply wrong code every day, code that they never debug and not even understand, that maybe works for a basic set of details, but once the real world hits it, it will fail like those self driving cars driving full speed into a trailer that has the same color as the road or sky.

I think there's a vast ocean of different software engineers and the type of code they write on a daily basis, which is why you get such differing views on AI's effectiveness. For me, AI has only ever been useful for very basic tasks and scripts: If I need a quick helper script for something in Python, a language that isn't my daily drier, AI usually gets me there in a couple of prompts. Or maybe I am writing some powershell/bash and forgot the syntax for something and AI is quicker (or has more context) than a web search.

However, my main job is trying to come up with "elegant" architectures for complex business logic that interacts with an existing large code base. AI is just completely out of its depth in such cases due to lack of context but also lack of source material to draw from. Even unit tests only work with the most basic of cases, most of the time the setup is so complex it just produces garbage.

I've also had very little luck getting it to write performant code. I almost have to feed it the techniques or algorithms before it attempts to write such code, and even then it's usually wrong or not as efficient as it could be.

> my main job is trying to come up with "elegant" architectures for complex business logic that interacts with an existing large code base

Right, that's surely the main job of nearly every experienced developer. It's really cool that LLMs can generate code for isolated tasks, but they can barely even begin to do the hard work, and that seems very unlikely to change in the foreseeable future.

Indeed, if you need a lot of boilerplate that's pretty similar to existing commonly available code, you're set. However...

That code is probably buggy, slow, poorly architected, very verbose, and has logical issues where the examples and your needs dont exact match.

Generally, the longer the snippet you want your LLM to generate, the more likely its going to go off the rails.

I think for some positions this can get you 90% of the code done. For me this usually means I can get started very fast on a new problem, but the last remaining "10%" actually takes significantly longer and more effort to integrate because I dont understand the other 90% off the top :)

I have the same experience. I have my own benchmark, I take a relatively complex project like FreeSWITCH on github, which is part of the training set for all coding LLMs anyway so they should know it and I ask the AI to code small snippets, tests, suggestions and fixes to see how well it understands the codebase and the architecture.

I just tried the latest Cursor + Sonnet and it failed in every task. The problem is that there is no way to understand the code without either complete understanding of the domain and the intents or running it in some context.

Telecom and media domains in particular are well documented in specs and studied in forum discussions. I am sure they are part of the training data, because I can get most answers if asked directly. So far the LLMs fail to reason about anything useful for me.

I don’t use ChatGPT or chat AI for coding, at least not often. It does help me get unstuck on occasion, but most of the time the context switch is too much of a productivity sink.

However, I use copilot autocomplete consistently. That makes me much more productive. It’s just really good autocomplete.

As an experienced developer I’d put my productivity improvement at 2-3x. It’s huge but not 10x. I’m limited by my decision speed, I need to decide what I want the code to do, AI can’t help with that - it can only do the “how”.

Far from introducing more bugs, using Copilot frees some mental cycles for me to be more aware of the code I’m writing.

It's going to be the same issue as poorly-structured Excel workflows: non-programmers doing programming. Excel itself gets the blame for the buggy poorly-structured spreadsheets made by people who don't have the "programmer mindset" of thinking of the edge cases, exceptions, and extensibility.

So it will be with inexperienced people coding with LLMs.

Absolutely the same for me. Whenever I encounter a problem I'm pretty sure nobody else has encountered before (or has not written about it at least), ChatGPT writes complete nonsense.

On stuff that's a bit obscure, but not really (like your Vulkan example), ChatGPT tends to write 60-95% correct code, that's flawed in the exact ways a noob wouldn't be able to fix.

In this case, nicking code from Github seems to fix the issue, even if I need to adapt it a bit.

Then comes the licensing issue. Often, when searching for an obscure topic, the code ChatGPT generates is very close to what's found on Github, but said code often comes with a non-permissive license, unlike what the AI generates.

> Then comes the licensing issue. Often, when searching for an obscure topic, the code ChatGPT generates is very close to what's found on Github, but said code often comes with a non-permissive license, unlike what the AI generates.

I think this is a feature for a lot of people. ChatGPT can launder code so they don't have to care about licenses.

Having spent a couple of hours with some llama models and others I've given up on them for code. Code generation from XML or grinding it out ad hoc with stuff like sed on top of a data or config file is faster and more convenient for me.

The Jetbrains thing is rather rudely incompetent, it consistently insists on suggestions that use variables and fragments that are supposed to be replaced by what I'm writing and also much more complex than what I actually need. I suffered through at least a hundred mistaken tabbed out shitty suggestions before I disconnected it.

I feel the same way. Anytime someone says they don't find LLMs all that useful, the exact same comments come out:

"They clearly aren't using the right model!"

"It's obvious they don't know how to prompt, or they would see the value."

"Maybe it can't do that today, but GPT-5 is just around the corner."

I feel more and more that people have just decided that this is a technology that will do everything you can imagine, and no evidence to the contrary will change their priors.

For me llm is faster stackoverflow without fear of my question being closed. I know the first answer will not be what I want. I know that I will have to refractor it to suit my style. I know it will be full of subtle bugs.

Oh and I expect it to be free, I ain't paying for this just like I wasn't paying for stackoverflow.

Finally I hope than in few years I will be able to just "sudo apt-get install llm llm-javascript llm-cooking llm-trivia llm-jokes" and it will all run locally on my low end computer and when I report bug, six months later it will be fixed when I update OS.

You are paying, the same way you paid for stack overflow.. you become part of the process, ask follow up questions, deepening the knowledge of the system.

The same applies to AI. The old learning material is gone, your interaction is now the new learning material and ground truth.

PS: Hourly rates for sw engineers: Range:€11 - €213 - so one hour on stackoverflow, searching and sub-querying resolving problems costs you or your employer up to 213€. It really depends on what you have negotiated.

And unlike Stack Overflow, which is available to everyone online and has an open content license, the IP the users of the ChatGPT style services is entirely proprietary to the company. I am not interested in feeding their machine with my brainpower. On the other hand, I happily contribute to Stack Overflow, and open source software+hardware. I do not think I will integrate LLM into my engineering workflow until there is a good service/solution which builds up the commons. The huge companies already have way too much influence over key aspects of knowledge-based society.

Every month, every update, every week, every contract change, every day - the setting is in another menu, in another castle, the setting is a princess, kiss the button, the button is now on, now off, is now dark patterned, now it isn't, deliver proof of work to set the setting, solve 1 captcha, solve 20.. come on.. its rodeo time.

I'm with you. Every time I've used LLMs in my work, I've ended up using more time tidying up after the robot than it would have taken to just do the work myself from the start. I can believe that there are some tasks that it can do very fast, but my experience is that if you're using it for anything that matters, you can't trust it enough to let it do it on its own, and so you just end up doing the work twice.

It's like having an unusually fast-but-clumsy intern, except interns learn the ropes fast and understand context.

More likely to be confirmation bias; most of us ask the wrong questions, try to confirm what we already believe rather than choose questions that may falsify our beliefs.

I have some stand tests for LLMs: write a web app version of tetris, write a fluid dynamics simulation, etc., and these regularly fail (I must try them again on 4o).

But also, I have examples of them succeeding wildly, writing a web based painting app just from prompting — sure, even with that success it's bad code, but it's still done the thing.

As there are plenty of examples to confirm what we already believe, it's very easy to get stuck, with nay-sayers and enthusiasts equally unaware of the opposite examples.

I mean when the people saying it doesn't work for them, would kill them to give links to the chats on ChatGPT.com so everyone can see the prompts used? when they do, it's a different conversation. like the number of R's in strawberry, or 9.11 - 9.2. When the complaints are generic with no links, the responses are similarly generic, because both sides just biases that they're right and the other side is the one that's wrong.

I welcome people picking apart chats that I link to. it's not that I believe that LLMs are magic and refuse to adjust my model of how good these things are and aren't, but when people don't give specific evidence is hard to actually move the conversation forwards.

because yeah, these things are plenty stupid and have to be tricked into doing things sometimes (which is stupid, but here we are). they're also pretty amazing but like any hammer, not everything is a nail.

It codes great for me, helps me deliver faster, better tested and more features. It literally saves time and money every day. If you can't do that, maybe, just maybe, you have a you problem. But there are many like you in this thread so you are not alone.

The worst part about LLMs is this attitude it's giving to people who get a few helpful answers in a row

You're like a gambling addict who thinks he's smarter than everyone else

I feel the exact same way. I have felt 0 need to use an LLM in my current workflow. If I could explain concisely what my problem is in English - then I would already know the answer. In that case why would I be asking AI the question. And I don't want opinionated answers to subjective questions. Why run that through a filter, I can investigate better myself through links on the internet.

By the way - I think AI or ML whatever has some valid uses right now. but mostly in image processing domain - so like recognizing shapes in some bounded domain OK yea. Generative image - NOT bad but theres always this "AI GLOW" to each image. Something is always off. Its some neat tools but a race to the bottom and mostly users want to generate explicit content lets be real. and they will become increasingly more creative and obtuse to get around the guards. nothing is stopping you from entering the * industry and making tons off money. but that industry is always doing good.

a friend recently suggested to use AI to generate generic icons for my game. Thats a really good use case. but does that radically change the current economy?

[BTW GENERIC STUFF ONLY UNTILL I could hire someone because i prefer that experience way more. you can get more interesting results. 4 eyes are better than 2.

>> If I could explain concisely what my problem is in English - then I would already know the answer. In that case why would I be asking AI the question. And I don't want opinionated answers to subjective questions. Why run that through a filter, I can investigate better myself through links on the internet.

I am an experienced programmer, but I find myself (for the first time) doing a deep-dive in SQL and specifically building code that will run against multiple SQL engines.

I fed my list of engines into AI. As I'm discovering the really weird corners of SQL I asking the AI to compare one db against the other. My prompts are usually no more than 3 or 4 words.

It gives me quick helpful answers highlighting where things are the same and where they are different. I can then follow up in the specific docs if necessary (now that I know the function name.)

Personally I'm somewhat anti-hype, I'll let others rave about "changing the world". But I have found it a useful tool - not so much for "writing my code" but for acting as my tutor as I learn new things. (I'm using it for more than just SQL or computers now.)

I'm not sure it "changes thr economy" - but it can certainly change individual lives. Some jobs will go away. Others might be easier to do. It might make it easier to learn new skills.

Recently I saw the process of filling for 2 insurance claims through 2 different entities. First one used a great ai voice agent that does the process of filtering your query. it understood me perfectly. But then I still had to wait in line for an actual agent. Ok wait for 6 hours. Whatever. Call again - the 2nd time going through the same agent is painful. Its so slow. And all this just to connect me to a human - who was EXCELLENT. What did ai add? And for now a human has to be involved in any complex permission changes to your account. AND I like that.

The 2nd agency I called them through phone. No ai. But it is excellent cause they do async processing. so you reserve a slot and they call you back. i don't care if those are not answered urgently. Because I just want to talk to a human.

AI lets the agency better be able to afford to hire and keep the excellent humans around to solve complex problems. It's better than a human that also can't fix your issue themselves and has to ask you to transfer to the other person. Ideally, the AI agent would have let you book a callback time with the specific right human who's best able to fix your issue. Some companies are able to do this, and you, the customer, never realize that of the 100 specialists in the building, you got connected to the one person who understands your issue and has the tools to fix it.

Customer service is hard because it has to filter out 90% noise, 9% fairly straightforward tasks, and the <1% of complex issues that need to be sent to the right human.

DONT get me wrong. What you're saying I actually support. If this allows that company to have quality staff I'm all for it. The less time we both spend on each other the better. So just offer async processing.

Your point is that AI is bad at some things, and in some cases misused. Which is of course abundantly true.

But equally it doesn't prove, or even assert, the opposite. A bicycle may be bad at cross-country road trips, but that doesn't make it a bad choice for some other situations.

Hence my earlier comment - I (and I suspect others) are finding it useful for some tasks. That is not to imply it is good at all tasks.

Is it over-hyped? Of course yes. Welcome to IT where every new thing is over-hyped all the time.

I think llms are useful when you're trying to write something in a language you don't know well; then it speeds up the part where you need to check for simple idiosyncrasies.

If you don't know the language at all it's dangerous because you may not understand the proposed program (and of course if you're an expert you don't need it at all).

But llms won't help to find solutions to a general, still unspecified problem.

I've been very productive using LLMs, without any expectations of them "writing functional code". Instead, I've mostly used them as if I was working with a human research librarian.

For example, I can ask the LLM things like "What are the most common mistakes when using the Vulkan API to render a triangle with a texture?" and I'll very rapidly learn something about working with an API that I don't have deep understanding of, and I might not find a specific tutorial article about.

As another example, if I'm an experienced OpenGL programmer, I can ask directly "what's the Vulkan equivalent of this OpenGL API call?" and get quite good results back, most of the time.

So I'm asking questions where an 80% answer is still very valuable, and it's much faster than searching for documentation and doing a lot of comparison and headscratching, and it works well enough even when there's no specific article I could find in a variety of searches.

Anything better that the technology gets from here just makes things even easier yet!

I was skeptical like you, but recently decided to try it out. I wasn't expecting much, and as such I was slightly surprised.

For example, just now my NAS stopped working because the boot device went offline. So I got to thinking about writing a simple syslog server. I've never looked at the syslog protocol before, and I've never done any low-level TCP/UDP work in C# yet.

So I asked ChatGPT to generate some code[1], and while the result is not perfect it's certainly better than nothing, and would save me time to get going.

As another example, a friend who's not very technical wanted to make an Arduino circuit to perform some automated experiment. He's dabbled with programing and can modify code, but struggles to get going. Again just for kicks, I asked ChatGPT and it provided a very nice starting point[2].

For exploratory stuff like this, it seems to provide a nice alternative to searching and piecing together the bits. Revolutionary is a quite loaded word, but it's certainly not just a slight improvement on what we had before LLMs and instead feels like a quantum leap.

[1]: https://chatgpt.com/share/f4343939-74f1-404d-bfac-b903525f61... (modified, see reply)

[2]: https://chatgpt.com/share/fc764e73-f01f-4a7c-ab58-f43da3e077...

This is what the AI is great at - the topics might be obscure to you but they are not that obscure in general, so the AI has had a lot of training data.

I've also assembled a Kubernetes cluster overnight, despite not knowing much about Kubernetes before, and I ran the kubectl files ChatGPT made for me past some devops folks, and it passed the smell test.

I consider much of coding to be a magic spellcasting tutorial - we do conceptually simple things, and the difficult lies in figuring out how to use novel libraries and get them to do what you want.

Edit: After checking out the Arduino sketch, I'd take issue with all the floating point calculations in there - most microcontrollers don't have FPUs, and the performance is awful on 8-bit AVRs. It's not great on Cortex M3s either as all this stuff is done in software, and each FP operation is like a hundred cycles.

I'd definitely try to rephrase the issue with integer math. It might work, but no self-respecting embedded dev would write like this.

> the topics might be obscure to you but they are not the obscure in general

Exactly, it's a very nice alternative to searching the web and discovering new stuff.

> most microcontrollers don't have FPUs, and the performance is awful on 8-bit AVRs.

I used to think like you. But then I realized the Atmega 328p is running at 16 MHz, so even hundreds of . As you can see here[1], it can do 94k double-precision FLOPS, more than enough for simple sketches like this. This jives with the benchmarks I did several years ago.

Sure if I was writing some tight control loop or similar I wouldn't use floating point.

[1]: https://kreier.github.io/benchmark/LinpackDP/

Still it does not sit right with me - it might work, but figuring out the math with fixed point is not that hard, just requires some sitting down and thinking about it.

It's like I have some light fixtures in my attic that are connected with wires twisted together and covered with electrical tape - they certainly work and have done for a decade, but its still not right and I wouldn't recommend anyone do it this way.

I’m similar: LLM can improve my productivity by 10-20% when I’m working on something familiar, 30-50% when I’m venturing into a popular domain that I’m unfamiliar with.

I just don’t understand where the hell is this magical LLM capable of generating flawless files or even entire projects that many people are talking about. I rarely accept a large block of LLM-generated code without close inspection, and I’ve ended up with a subtle bug that I wouldn’t have written myself at least ~5 times now. Unless you don’t give a shit about code quality, error handling, proper logging, and subtle bugs, you shouldn’t run LLM-generated stuff without painstakingly reading and fixing everything. Or maybe there really is a magical LLM somewhere.

Very interesting. A couple of notes here on the C# version.

Its using the old format where the Program.cs file has an actual class, whereas as of .NET 6 thats not required.

You said barebones, but for any real server you would want to use the generic host https://learn.microsoft.com/en-us/dotnet/core/extensions/gen... which gets you a lot of the boilerplate and enables you program to be wrapped in a windows or systemd service.

Finally, parsing can be simplified since ASCII is a proper subset of UTF-8, you can just parse the entire string as UTF-8. IMHO I am disappointed that the AI didn't point that out.

> You said barebones, but for any real server you would want to use the generic host

True, I intentionally said barebones as I wanted a minimal example. I asked it to modify the code to use the generic host, and updated the chat link (so refresh). Keep in mind this is the free ChatGPT, but I still think it did reasonably good. The example compiles as-is, and is very close to functional. I've not used the generic host stuff before either, so again this would save me time searching and piecing together code.

> Finally, parsing can be simplified since ASCII is a proper subset of UTF-8, you can just parse the entire string as UTF-8.

I don't think that would work, because the free-form text message part at the end must contain a BOM if it's UTF-8 encoded, according to the specification. AFAIK you can't have the BOM in the middle of a string.

This is how i currently utilise and view The AI tools. I replaced the googling in my flow with it. It might have taken me 20minutes before to get the boilerplate together, now its a minute. Depending on the type of code legacy overheads you have, results may vary. If you can break your problem down into small descrete bits.

The way the OP here was talking about sonnet being above way above chatgtp in this case, it could be true. Google probably has the largest Go codebases on search to train the AI on higher quiality inputs. Go is a simpler language with less variation over time compared to something like .net also adding to its corner.

ive always been the type of person to use the right language for the each use case. For the last 10+ years ive primarly been building cross platform apps that target every common OS. So these "ai" tools like phind.com give me a universal API interface and generate code which is equiv to a SO answer. They are the ability of an outsourced junior dev who you would never let push code to prod, that doesnt have the language barrier but retains the fake dergee overheads ;)

My experience with CoPilot has been similar. I don't think it's given me a single piece of code that just worked. It always took several back and forth of me telling the "AI" that I need it to do X instead of Y (which was included in the original instructions but ignored).

It seems to work best if I start with something very simple, and then layer on instructions ("now make it do X").

Where I have found it saves me time is in having to look up syntax or "gotchas" which I would otherwise search StackOverflow for. But as far as "writing code" -- it still feels a long way from that.

I have a similar experience, but I still use LLMs, just a bit differently. I pretty much never ask it to generate complex code. I also rarely ask for definitions or facts, cause of the tendency to generate garbage answers. What I use it for is usually tedious stuff that is easy to do, but would take me more time to type, rather than ask the LLM.

For example:

* I need a simple bash script for file manipulation or some simple tasks like setting up a project (example: download a secret from AWS SSM, check if an executable exist, if it doesn't write instructions on how to install it on most popular systems etc)

* I need a simple HTTP API, nothing fancy, maybe some simple database usage, maybe running some commands, simple error handling

* I need a YAML file for Kubernetes. I say what I need and usually, it gets most of it right

* I want an Ansible task for some simple thing. Ansible is quite verbose, so it's often saving me time

* I have a Kubernetes YAML file, but I want to manage it in terraform - I'll then ask to convert YAML to a terraform entry (and in general converting between formats is nice, cause even if you have only a piece of what you want to convert, LLMs will most of the time get it right)

* Surprisingly, it often gets openssl and ffmpeg commands right - something I always have to google anyway, especially openssl certificates generation or manipulation

* Giving it a function I wrote and writing test samples after providing a list of what it should test (and asking if it can come up with more, but sadly it rarely does generate anything useful on top of what I suggest)

Same here.

A friend, whose SQL knowledge is minimal, used an LLM to query data from a database over a couple of tables. Yes, after a lot of trial and error he (most probably) got the correct data, however the only one being able to read the query is the LLM itself. It's full of coalesce, subselects that repeat the same joins again and again.

LLM will do a lot for you, but I really hate this "this will [already did] solve everything". No, it did not and no, because it's quality is those of a junior dev, at max.

> because it's quality is those of a junior dev, at max.

Echo chamber/bias I guess. I know many, many seniors with big pay checks working for big companies who are vastly worse than sonnet at the moment. Juniors stand literally no chance unless very talented.

I understand a junior dev as someone who's missing (or does not have at a greater level) of maintainability, readability, longevity of a solution.

I don't know what companies pay for and I don't care, because if we go by that, every definition of every word is arguable (because there's everywhere someone out of the range of the definition)

Lots of companies are directly involved with LLMs, or working to leverage it for startups or their existing projects. And I think a fair chunk of all the employees working at these places probably post on HN (a crazy high percent of the recent batches of YC applicants were using LLM stuff, for instance). That's going to lead to a sizable number of perspectives and opinions that are not especially free of bias, simply because you want to believe that what you're working on is ultimately viable.

And I think it'd be extremely easy to convince oneself of this. Look at where 'AI' was 5 years ago, look at where it is today and then try to imagine where it will be in another 5 years. Of course you have to completely blind yourself to the fact that the acceleration has clearly sharply stalled out, but humans are really good at cognitive dissonance, especially when your perception of your future depends on it.

And there's also the point that even though I'm extremely critical of LLMs in general, they have absolutely 'transformed' my workflow in that natural language search of documentation is really useful. Being able to describe a desired API, but in an overly broad way that a search engine can't really pick up on, but that an LLM [often] can, is just quite handy. On the other hand, this is more a condemnation of search engine tech being frozen 20 years in the past than it is about an imminent LLM revolution.

Where LLM shine is as a faster alternative to Google and Stack overflow. "How do I reverse an array in language X?". This will give you the right answer in seconds without having to click through garbage.

Especially if it's a question that's hard to Google, like "I remember there is more than one way to split an array in this language, list them". This saves me minutes every day.

But it's especially helpful if you are working on projects outside your own domain where you are a newbie.

Something worth noting is that the parent comment refers to using Cursor, not ChatGPT/Claude.ai. The latter are general-purpose chat (and, in the case of ChatGPT, agentic) applications.

Cursor is a purpose-built IDE for software development. The Cursor team has put a lot of research and sweat into providing the used LLMs (also from OpenAI/Anthropic) with:

- the right parts of your code

- relevant code/dependency documentation

- and, importantly, the right prompts.

to successfully complete coding tasks. It's an apple and oranges situation.

> but every time I try I find it borderline impossible to get functional code even for pretty straightforward prompts.

I work in 9 different projects now and I would say that around 80% of functional code comes from Sonnet (like GP) for these projects. These are not (all) trivial either; there is a very niche (for banking) key/value store written in Go for instance which has a lot of edge cases etc, all the plumbing (x,err = etc aka stuff people find annoying) comes from sonnet and works one-shot. A lot of business logic comes from sonnet too; it works but usually needs a little tweaking to make it correct.

Tests are all done by Sonnet. I think 80% is low balling it on Go code really.

We have a lot of complex code generator stuff and DSLs in TS which also works well often. Sometimes it gets some edge cases wrong, but either re-prompting with more details or fixing it ourselves, will do it. At a fraction of the time/money of what a fully human team would deliver.

I wrote a 3d editor for fun with Sonnet in a day.

I have terrible results with gpt/copilot (copilot is good for editing instead of complete files/functions; chatgpt is not good much compared with sonnet); it doesn't get close at all; it simply keeps giving me the same code over and over again when I say it's wrong; it hardcodes things specifically asked to make flexible etc. Not sure why the difference is so massive all of sudden.

Note: I use the sonnet API, not the web interface, but same for gpt so...

It was a few hours work so it was not 'done properly', but the point was, it is working code and, as I know nothing about 3d or game programming, I didn't do any of it. People here claim it (current LLMs) cannot produce working code for entire software's without a human doing a lot of it; it clearly can and for non trivial stuff. I would say the real point is ; it cannot do complete code for non trivial programs without human doing most of the coding IF the person prompting is not an experienced coder. I am, so I can review the generated code, prompt in English what it should change and then it works. It is often faster to write code instead of that (so the thing blurbs 100% working code but with 20% wrong/bugs which I fix as a human) but in domains I know nothing about, English is faster for me.

Essentially to me it feels like almost all LLM's are either mid -> terrible if it's any system programming. Especially I've not had luck with anything outside of web programming and moreso anything NOT javascript / python. (jeez I wrote this sentence terribly)

LLM coding assistants are more like another iteration in the jump from nothing -> Stack Overflow type resources than a replacement for you doing coding work as a programmer just because you can phrase the task as a prompt. If you measured the value of using resources like tutorials Stack Overflow posts by blindly merging the first 2 hundred line examples of the things you want to do, finding the result wasn't ideal, and declaring it a useless way to get functional code people would (rightfully) scratch their heads at you when you say they are living in a different world. Of course it didn't work right away, perhaps it was still better than figuring all of the boilerplate out on your own the first time you did it?

For LOVR (Lua based 3D/VR framework), I found these LLMs pretty much useless, both ChatGPT and Claude. Seems all is trained on old APIs, so it takes quite a bit of effort to make any suggestions work with newest LOVR version.

I wonder if quality would improve if one would upload the latest LOVR documentation and uploaded LOVR codebases using the newest version and instruct it properly?

I had a decent experience putting together frontend code for a demo control system with Mixtral within a couple days. I'm a seasoned programmer but I don't do JS. It stumbled a dozen times but it fulfilled the task of me avoiding to learn JS.

However once you step outside JS or Python, the models are essentially useless. Comprehension of pointer semantics? You wish. Anything with Lisp outside its training corpus of homework assignments? LOL. Editing burden quickly exceeds any possible speed-up.

Gpt seems to have gotten worse, Claude is the new hotness.

But, I agree with your sentiment that asking it to do stuff like that often doesn’t work. I’ve found that what it _can_ do is stuff like “here’s a Model object, write a query to fetch it with the schema I told you about ages ago”. It might not give perfect results, but I know how to write that query and it’s faster to edit Claude’s output than it is to write it from scratch.

I'd go one step further and suggest that one of the few things LLMs are good at is acting as astroturfing agents for themselves. I must be crazy to not be completely changing my workflow and job title with [THE_GOOD_ONE] and not that other one, and wow so many other people vocally feel the same way on every other forum that's extremely easy to manipulate.

Fwiw, I've had some helpful successful prompts here and there, and in some very narrow scopes I'll get something usable, like parsing JSON or scaffolding some test cases, which is real saved time, but I stopped thinking about these tools long ago.

To get real value out of something like your example, I'd be using it as a back and forth to help me understand how some concepts work or write example questions I can drill on my own, but nothing where precision matters

> I feel like I'm living in a different universe sometimes. The consensus on HN seems to be that you can be pretty productive with LLMs as coding assistants, but every time I try I find it borderline impossible to get functional code even for pretty straightforward prompts.

Same, it can't even fix an xcode memory leak bug in a simple app. It will keep trying and breaking it non-stop. Garbage

It's the same for me. It has to work 100%. I have 15+ years of coding experience, so I can code arguably fast. If I ask the LLM to code something, and it only partially works, then debugging and fixing the generated not-so-optimal code takes me more time than simply writing it from scratch.

There are also some more gotchas, like the generated code using a slightly different package versions than the installed ones.

> The consensus on HN seems to be that you can be pretty productive with LLMs as coding assistants

If you define "productive" as writing a simple CRUD web application that your 13-year-old cousin could write between two gaming sessions, then you'll consider LLMs as sacred monsters.

Snake oil vendors always had great appeal over people who didn't know better.

Thats because GPT products are in a different (much worse) universe to Anthropics Sonnet/Opus, they are truly phenomenal.

Give Anthropic a shot (its even better via the API console.anthropic.com/workbench).

OpenAI is yesterdays news.

> I gave GPT two small twists so it's not a simple case of copy/paste

Why? I see it like querying a database of human knowledge. I wouldn't expect a SQL database to infer information it's never seen before, why would I expect an LLM to do so?

I use it where I know a solution exists but I'm stumped on the syntax or how to implement it in an unfamiliar environment, or I want to know what could have caused a bug based on others' experience etc.

Yes me too! I don't have any stake for finsing LLMs unhelpful, and would love to have a tool to make me more productive.

Would be really interesting if anyone had blog posts on their actual workflow with LLMs, in case there's something I'm doing different.

Try something like maestro which uses agents and orchestrator. The looping quality checks have been very helpful. Claude-engineer from same developer is also good to experience how superior that ia to regular chat.

Yeah, these people claiming AI has been a transformative experience are just full of sh*t. I ask various models questions all the time because it's often better than googling, but all of them make a lot silly mistakes. Frequently, it can be a bit of a process to get useful results.

I look at this contrast as what I call the difference between a "programmer" and a "software engineer". These jobs are really two different universes in practice, so you're not wrong.

I saw an LLM demo at one point where it was asked to write FFT and add unit tests for it which really drove this point home for me.

A programmer is a nicer term for a code monkey. You ask them to write FFT and they'll code it. All problems can be solved with mode code. They can edit code, but on the whole it's more just to add more code. LLMs are actually pretty good at this job, in my experience. And this job is important, not all tasks can be engineered thoroughly. But this job has its scaling limits.

A software engineer is not about coding per se, it's about designing software. It's all about designing the right code, not more code. Work smarter, not harder, for scale. You ask them to write FFT and they'll find a way to depend on it from a common implementation so they don't have to maintain an independent implementation. I've personally found LLMs very bad at this type of work, the same way you and others relying to you describe it. (Ok, maybe FFT is overly simple, I'm sure an LLM can import that for you. But you get the idea.) LLMs have statistical confidence, not intellectual confidence. But software engineering generally works with code too complex for pure statistical confidence.

No offense to the LLM fans here, but I strongly suspect most of them are closer to the programmer category of work. An important job, but one more easily automated away by LLMs (or better software engineering long-term). And we can see this by how a lot of programming has been outsourced for decades to cheap labor in third-world countries: it's a simpler type of job. That plus the people biased because their jobs and egos depend on LLMs succeeding.

I stop it from coding, just ask it to brainstorm with me on a design or plan. After a few iterations it knows what I want and then I ask it for specific outputs. Or I code those myself, or some combination.

I’ve found that it’s sometimes amazing and sometimes wastes a lot of my time. A few times it’s really come up with a good insight I hadn’t considered because the conversation has woken up some non-obvious combination. I use ChatGPT, Claude, Perplexity and one or two IDE tools.

Counterpoint: I have gotten real value out of dumping broken SQL into ChatGPT and have it fix it for me. I could 100% have done that by myself, but it would have meant I have to go and google the right syntax.

AI is great for me, but it is more like a junior developer you are pairing with than a replacement.

I just signed up for the free version. Claude Sonnet does properly use malloc/free to manage the buffer where GPT-4o screws up (Yay!) It manages to gets through the whole process of initializing a graphics device and grabbing a queue from the device. It took some questionable shortcuts to get there (and didn't leave any comments explaining those shortcuts and the problems they could cause down the road), but fine, the code works.

After that it goes completely off the rails by trying to issue draw commands before binding a graphics pipeline, which is both illogical and illegal. After a lot of prodding, I did manage to get it to bind a graphics pipeline, but it forgot about the texture.

So Claude Sonnet is definitely better than GPT-4o, but it still feels raw, like a game of whack-a-mole where I can get it to fix a mistake, but it reintroduces an old one. I also have to be the one offering the expertise. I can prompt it to fix the issues because I know exactly what the issues are. If I was using this to try to fill in for a gap in my knowledge, I would be stuck when I ran the code and it crashed - I would have no idea where to go next.

Update: Took about 50 min of experimenting, but I did get Claude to generate code that doesn't have any obvious defects on first inspection, although it cut off about halfway through because of the generation limit. That's the best result that I've seen from an LLM yet. But that's after about a dozen very broken broken programs, and again, I think the domain expertise here is key in order to be able to reprompt and correct.

I agree, that could explain a lot of it. I also suspect that the length of the generated code plays a role. In my experience, LLMs sometimes peter out a bit and give up if the generated program gets too long, even if it's well within their context limit. (Where giving up means writing a comment that says "the rest of the implementation goes here" or starting to have consistency issues.) Python and JavaScript tend to be more succinct and so that issue probably comes up less.

Yes, you have figured it out. LLMs are terrible for graphics programming. Web development - much better. Sonnet 3.5 is the only good model around for now. GPT 4o is very poor.

I've had some moderate success asking Claude to "translate" smallish pieces of code from, eg, C++ to Python. One simple C++ file parser it managed to translate basically 100% right on one try. I wouldn't particularly trust the output- well, not until I run it through tests- but for quick "could this possibly work, what does the performance look like, what might this code look like" exploratory stuff it's been very useful, especially for code that you are likely to throw away anyway.

One example that I am still using: I wanted to generate a random DICOM file with specific types of garbage in it to use as input for some unit tests, and Claude was able to generate some Python that grabs some random DICOM tags and shoves vaguely plausible garbage data into them, such that it is a valid but nonsensical DICOM dataset. This is not hard, but it's a lot faster to ask Claude to do it.

what did it do when you told it all of those things? was it able to fix the problems when you pointed them out? did you give it one prompt and expect perfect code out on the first try? is that how you code? all your code complies and runs flawlessly first try? I'm jealous. it usually takes me a bunch of passes before I get things right.

here's a chat for a uc and LCD chip that I picked at random (and got the name wrong for) (and didn't want raspberry pi code for so it stopped it short on that response)

https://chatgpt.com/share/2004ac32-b08b-43d7-b762-91543d656a...

one trick is to let it come up with what it wants (lots of functions, no texture), then run the code, give it the errors until that's fixed. then ask it to inline them, then add the texture, etc.

Does your work not depend on existing code bases, product architectures and nontrivial domain contexts the LLM knows nothing about?

Every thread like this over the past year or so has had comments similar to yours, and it always remains quite vague, or when examples are given, it’s about self-contained tasks that require little contextual knowledge and are confined to widely publicly-documented technologies.

What exactly floored your colleague at Microsoft?

Context is the most challenging bit. FWIW, the codebases I'm working on are still small enough to where I rarely need to include more than 12 files into context. And I find as I make the context bigger beyond that, results degrade significantly.

So I don't know how this would go in a much larger codebase.

What floored him was simply how much of my programming I was doing with an LLM / how little I write line-by-line (vs edit line-by-line).

If you're really curious, I recorded some work for a friend. The first video has terrible audio, unfortunately. This second one I think gives a very realistic demonstration – you'll see the model struggle a bit at the beginning:

https://www.loom.com/share/20d967be827141578c64074735eb84a8

I know that you are getting some push-back because of your exuberance regarding your use of LLMs in development, but let me just say I respect that when someone told you to "put up or shut up" you did. Good on you!

So you spend 10 minutes writing a free text description of the test you want; tell it exactly how you want it to write the test, and then 4-5 minutes trying to understand if it did the right thing or not, restart because it did something crazy then a few minutes manually fixing the diff it generated?

MMmm.

I mean, don't get me wrong; this is impressive stuff; but it needs to be an order of magnitude less 'screwing around trying to fix the random crap' for this to be 'wow, amazing!' rather than a technical demonstration.

You could have done this more quickly without using AI.

I have no doubt this is transformative technology, but people using it are choosing to use it; it's not actually better than not using it at this point, as far as I can tell.

It's slower and more error prone.

Stoked you watched, thanks. (Sorry the example isn't the greatest/lacks context. The first video was better, but the mic gain was too high.)

You summed up the workflow accurately. Except, I read your first paragraph in a positive light, while I imagine you meant it to be negative.

Note the feedback loop you described is the same one as me delegating requirements to someone else (i.e. s/LLM/jr eng). And then reading/editing their PR. Except the feedback loop is, obviously, much tighter.

I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

But even if all things were roughly equal, I like being in the navigator seat vs the driver seat. Editor vs writer. It helps me keep the big picture in mind, focused on requirements and architecture, not line-wise implementation details.

> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

It seems to me that the first few are almost complete copy-paste of older tests. You would have got code closer to the final test in the update case with simple copy-paste than what was provided.

The real value is only in the filtered test to choose randomly (btw, I have no idea why that’s beneficial here), and the one which checks that both consumers got the same info. They can be done in a few minutes with the help of the already made insert test, and the original version of the filtered test.

I’m happy that more people can code with this, and it’s great that it makes your coding faster. It makes coding more accessible. However, there are a lot of people who can do this faster without AI, so it’s definitely not for everybody yet.

> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

I guess my point is I'm skeptical.

I don't believe what you had the end would have taken you that long to do by hand. I don't believe it would have taken an hour. It certainly would not have taken me or anyone on my team that long.

I feel like you're projecting that, if you scale this process, so say, having 5 LLMs running in parallel, then what you would get is you spending maybe 20% more time reviewing 5x PRs instead of 1x PR, but getting 5x as much stuff done in the end.

Which may be true.

...but, and this is really my point: It's not true, in this example. It's not true in any examples I've seen.

It feels like it might be true in the near-moderate future, but there are a lot of underlying assumptions that is based on:

- LLMs get faster (probably)

- LLMs get more accurate and less prone to errors (???)

- LLMs get more context size without going crazy (???)

- The marginal cost of doing N x code reviews is < the cost of just writing code N times (???)

These are assumptions that... well, who knows? Maybe? ...but right now? Like, today?

The problem is: If it was actually making people more productive then we would see evidence of it. Like, actual concrete examples of people having 10 LLMs building systems for them.

...but what we do see, is people doing things like this, which seem like (to me at least), either worse or on-par with just doing the same work by hand.

A different workflow, certainly; but not obviously better.

LLMs appear to have an immediate right now disruptive impact on particular domains, like, say, learning, where its extremely clear that having a wise coding assistant to help you gain simple cross domain knowledge is highly impactful (look at stack overflow); but despite all the hand waving and all the people talking about it, the actual concrete evidence of a 'Devin' that actually builds software or even meaningfully improves programmer productivity (not 'is a tool that gives some marginal benefit to existing autocomplete'; actually improves productivity) is ...

...simply absent.

I find that problematic, and it makes me skeptical of grand claims.

Grand claims require concrete tangible evidence.

I've no doubt that you've got a workflow that works for you, and thanks for sharing it. :) ...I just don't think its really compelling, currently, to work that way for most people; I don't think you can reasonably argue it's more productive, or more effective, based on what I've actually seen.

How worried are you that it's giving you bad advice?

There are plenty of, "Just disable certificate checking" type answers on Stack Overflow, but there are also a lot of comments calling them out. How do you fact check the AI? Is it just a shortcut to finding better documentation?

In my opinion it’s better at filtering down my convoluted explanation into some troubleshooting steps I can take, to investigate. It’s kind of like an evolved Google algorithm, boiling down the internet’s knowledge. And I’ve had it give me step by step instructions on “esoteric” things like dwm config file examples, plugins for displaying pictures in terminal, what files to edit and where…it’s kind of efficient I think. Better than browsing ads. Lol.

I think that Greptile is on the right track. I made a repo containing the c# source code for the godot game engine, and it's "how to do X", where X is some obscure technical feature (like how to create a collision query using the godot internal physics api) is much better than all the other ai solutions which use general training data.

However there are some very frustrating limitations to greptle, so severe that I basically only use it to ask implementation questions on existing codebases, not for anything like general R&D: 1) answers are limited to about 150 lines. 2) it doesn't re-analyze a repo after you link it in a conversation (you need to start a new conversation, and re-link the repo, then wait 20+ min for it to parse your code) 3) it is very slow (maybe 30 seconds to answer a question) 4) there's no prompt engineering

I think it's a bit strange that no other ai solution lets you ask questions about existing codebases. I hope that will be more widespread soon.

I work at Greptile and agree on all three criticisms. 1) is a bug we haven't been able to fix, 2) has to do with the high cost of re-indexing, we will likely start auto-updating the index when LLM costs come down a little, and 3) has to do with LLM speed. We pushed some changes to cut time-to-first-token by about half but long way to go.

Re: prompt engineering, we have a prompt guide if that helps, was that what you are getting at?

https://docs.greptile.com/prompt-guide

No idea about the product, but I would like to congratulate you guys on what is maybe the greatest name ever. Something about it seems to combine "fierce" with "cute", so I think you should consider changing your logo to something that looks like reptar

Not-so-subtly mocking the top-level for not replying "yet", when they replied almost immediately after with a video of the relevant workflow, was not a move that made you look smart or nice.

>when they replied almost immediately after with a video of the relevant workflow

Wow. Such wrong claims.

I had already replied to you in a sibling comment, refuting your points, but will give one more proof (not that I really need to):

_acco, the top level commenter relevant to this discussion, commented at some time, say x.

layer8 commented, replying to _acco, 7 hours ago (as can be seen on the page at the time of my writing this comment, i.e. right now).

I then replied to layer8, 6 hours ago.

_acco replied back to layer8 5 hours ago.

All this is visible right now on the page; and if people check it a few hours later, the relative time deltas will remain the same, obviously. (But not if they check after 24 hours, in which case all comments will show as one day ago.)

So there was a 1 hour gap between layer8's comment and mine, and a 2 hour gap between layer8's comment and _acco's reply.

If you think 2 hours is the same as "almost immediately", as you said above, I have nothing more to say to you, except that our perceptions of time are highly different.

I meant immediately after your reply. At the time I posted, your and acco_'s replies to layer8 both showed as "3 hours" ago. Now they both show as "13 hours ago". Really, I'm being generous in assuming they didn't reply before you.

Ed: ah, since the time I wrote this comment, your respective comments are now at 14 and 13 hours. Congrats on your <1hr lead.

By god, andrewflnr. Very "nice". /s. See point 4 below.

You just showed that you are inaccurate, pompous and fake, all of that, in one single comment of yours, above. How? I'll tell you:

1. inaccurate: That commenter's username (the one who started this subthread) is _acco, not acco_ as you wrote above.

Check that in their comment, or here:

https://news.ycombinator.com/user?id=_acco

I was careful to check the spelling before mentioning their name, unlike you, even when I referred to them earlier. The fact that you cannot even get the position of an underscore in a name, correct, seems to indicate that you are sloppy. which leads me to my next point.

2. pompous:

You said:

>Really, I'm being generous in assuming they didn't reply before you.

This is the pompous bit. Generous? Laughable.

I neither need not want your generosity. If anything, I prefer objectivity, and that people give others the benefit of the doubt, instead of assuming bad intentions by them: I had actually checked for a second comment by _acco (in reply to layer8) just before I wrote my comment to layer8, the one that got all of you in a tizzy. But you not only got the times wrong (see your edit, and point 3 below), but also assumed bad faith on my part.

3. fake.

You first said above that both those replies to layer8 showed as 13 hours ago, then edited your comment to say 14 and 13 hours. It shows that you don't use your brains. The feature of software showing time deltas in the form of "hours ago" or "days ago", versus an exact time stamp, is pretty old by now. It dates back to Web 2.0 or earlier, maybe it was started by some Web 2 startups or by Google.

If you think you are so clever as to criticize me without proof, or say that you are generous in your assumptions about me, you should have been equally clever or generous about the time delta point above, and so realized that I could have replied to layer8 before _acco, which was indeed the case. Obviously I cannot prove it, but the fact that I got _acco's name correct, while you did not, lends credence to my statement. It shows that I took care while writing my comment.

4. So you are fake because you don't bother to think before bad-mouthing others, and even more fake because you did not apply (to yourself) your own made-up "rule" in this other comment of yours, where you criticized my comment as being neither smart nor nice, so not of value:

https://news.ycombinator.com/item?id=41310460

I should not have had to write such a long comment to refute your silly and false allegations, and I will not always do that, but decided to do it this time, to make the point.

And, wow: you managed to pack 3 blunders (being inaccurate, pompous and fake) into a comment of just a few lines. That's neither smart not nice. Instead, it's toxic.

Actually, your inaccurateness (inaccuracy? GIYF) is even worse than I said above. My comment of a few levels above, literally uses the name _acco at least four times - I checked just now. And your comment was in reply to that. So even after reading that person's name four times in my comment, you still got its spelling wrong. Congrats. (Yeah, I can snark too, like you did to me upthread.)

2 hours in a discussion forum, where the discussion spans days or sometimes weeks is certainly an ”almost immediate” response.

Perception of time is subjective.

Err ... you realize that your argument just shot itself in both left feet?

I just love it (not!, but it happens every now and then) when HN users display ignorance of basic logic, on a site for techies, yet.

I could use your own argument against you:

>is certainly an ”almost immediate” response

>Perception of time is subjective.

So, you use "certainly", and then "subjective", just after it, in the same argument about the same topic?

Brilliant. Do you not realize that by your last sentence above, you make my own case for me?

Hey.

These are the four simple lines that I wrote above:

>Solid questions and comments, layer8.

>I notice that the person you replied to has not replied to you yet.

>It may be that they will, of course.

>But your points are good.

(Italics mine, and they were not in my original comment.)

You, above:

>when they replied almost immediately after with a video of the relevant workflow,

I did check the time intervals between the top level comment and layer8's comment, before my first reply. It is over an hour now, so I cannot see the exact times in minutes any more, but IIRC, there was a fairly long gap (in minutes). And I also think I noticed that the top level person did reply to someone else, but not to layer8, by the time I wrote my comment.

So I don't see anything wrong in what I said. I even said that they may reply later.

You consider that to be:

>"Not-so-subtly mocking"?

Jeez. I think you are wrong.

Then I have nothing further to say to you, except this:

>was not a move that made you look smart or nice.

Trying to look smart or nice is not the goal in online discussions. At least, I don't think so. You appear to think that. The goal (to me) is to say what you think, otherwise, why write at all? I could just get an LLM to write all my comments, and not care about its hallucinations.

I don't try to be smart or nice, nor the reverse. I just put my considered thoughts out there, just like anyone else. Obviously I can be right or wrong, just like anyone else can be. And some points can be subjective, so cannot be said to be definitely either right or wrong.

If a comment is not at least one of smart or nice, it's a waste of space and attention. That may not be your purpose, but don't act shocked when people respond with negativity.

>You were being passive aggressive and adding nothing to the discussion.

What arrant nonsense!

Don't accuse people falsely or without substantiating your accusation.

My comment being referred to above, consisted of four simple short lines.

I am not going to bother to paste them here again for a third time.

Those HN readers who wish to do so, can go the few needed levels up this thread and check for themselves:

Nothing I said in those four lines can be interpreted as being passive-aggressive.

I say that you do not know what the heck you are talking about, if you claim that I was passive-aggressive in that comment.

I see so many people hyping Claude Sonnet + Cursor on Twitter/X, yet in real world usage, I find it no better than GitHub Copilot (presumably GPT 4o) + VScode.

Cursor offers some super marginal UX improvements over the latter (being that it’s a fork of VScode), since it allows you to switch models. But Claude and GPT have been interchangeable at least for my workflows, so I’m not sure the hype is really deserved.

I can only imagine the excitement comes from the fact that cursor has a full-fat free trial, and maybe most people have never bothered paying for copilot?

Hmm, I've definitely always used paid copilot models.

Perhaps it's my language of choice (Elixir)? Claude absolutely nails it, rarely gives me code with compilation errors, seems to know and leverage the standard library very well, idiomatic. Not the same with GPTs.

In my experience ChatGPT seems particularly bad at Elixir, presumably because there is a comparative lack of published code and discussion about it.

I think my general perception is that AI is a great assistant for some occupations like software engineering, but due to its large room for error it's very impractical for majority of business applications that require accuracy. I'm seeing this trend at my company, which operates in the medical field and recently mandated that all engineers use CoPilot. At the same time it's a struggle to see where we can improve our business processes with AI - outside of basic things like transcriptions and spell checking - without getting ourselves into a massive lawsuit.

I've been using (GitHub) Copilot and ChatGPT since they've been widely available. I started using ChatGPT for coding after 4 came out.

I was an early advocate for Copilot but, honestly, nowadays I really don't find it that useful, compared to GPT-4o via ChatGPT.

ChatGPT not being directly integrated into my editor turns out to be an advantage. The problem with Copilot is it gets in the way. It's too easy to unintentionally insert a line or block completion that isn't what you want, or is out and out garbage, and it's constantly shoving up suggestions as I type, which can be distracting. It's particularly irritating when I'm trying to read or understand a piece of code, or maybe do a refactor, and I leave my caret in one position for half a second too long, and suddenly it's ghost-inserted a block of code as a suggestion that's moved half of what I'm reading down the screen and now I have to find my place again.

Whereas, with ChatGPT being separate, it operates at a much less intrusive cadence, and only responds when I ask it too, which turns out to be much more useful and productive.

I'm seriously considering binning of my Copilot subscription as a result.

Copilot was surprisingly one of the worst AI products I’ve used. It keeps suggesting nonsense and the insertions it does in VSCode often mess up very basic things like it starts writing things in the wrong place (or adds a function inside another function) and the result doesn’t even compile. It also seems to have zero understanding of the codebase apart from maybe a few lines before and after. Incredible that this is being sold as a real product at this point

I canned my Copilot sub.

Even where microsoft includes it for free, like their automation tools, its always been jank. I found myself going to GPT4 for better answers which is bad when you spend any time thinking about it.

Have you tried using the shortcut for turning copilot on/off? I know what you mean and in those cases I just turn it off for a second and type freely, then turn it back on.

It's useful for analyzing data, but only if it can be verified. We use it (higher education) to glance at data trends that may need further exploration. So it's a fancy pivot table, I guess.

Most software vendors are selling their version of AI as hallucination free though. So that's terrifying.

>Most software vendors are selling their version of AI as hallucination free though.

It definitely has that Tesla Autopolit feel to it, where the marketing is ... enthusiastic, while the onus to "use the tool responsibly" is left up to the end user.

I'm building a persentation/teaching tool for teachers and trying to not sweep the halicination angle under the rug. My marketing angle is that the AI is a huge time saver, but the classroom teacher is the subject expert and they need to review the output. At the end of the day, it's still much more time efficient for an expert to review the content than it is to produce it from scratch which is a huge win.

Sadly, I think your sales pitch will fall flat. College and University admin aren't technology experts, but are VERY susceptible to flattery from tech sales people. They want to hear about the time and cost savings, regardless of reality.

Also, to be super presumptuous, if you need help with anything with your tool, hit me up. I love working with developers on building realistic use case and workflows.

>Sadly, I think your sales pitch will fall flat. College and University admin aren't technology experts

I'm building tools for K-12 (I've been a teacher for 20+ years). There's solid uptake of GPT among teachers and individually they're pretty enthusiastic about anything that will reduce their workload, so I'm hoping it resonates.

>Also, to be super presumptuous, if you need help with anything with your tool, hit me up. I love working with developers on building realistic use case and workflows.

Sure, my details are in my profile, happy to chat.

>I think my general perception is that AI is a great assistant for some occupations like software engineering

I think that's why I like to compare the current state of AI to the state of the CPU industry maybe around the 286-486 era going towards the Pentium.

It's just a requirement that everyone has a license and did 1-hour introductory training. Whether you actually use it or not is up to you, but it's encouraged.

I was implying that AI can be useful in software engineering because hallucinations are expected and actively addressed by engineers. It's less of an issue in comparison to automating strategic business decisions by what is essentially a "black box" and not knowing if it hallucinated data in order to arrive at its decision.

Software engineer for multiple decades here. None of the AI assistants have made any major change to my job. They are useful tools when it's time to write code like many useful tools before them. But the hard work of being a senior+ software engineer comes before you start typing.

I will admit, if I need to do some one off task and write a quick python script to do something I will likely go to Claude or something and write it. I am talking 20-40 lines. I think it's fine for that, it doesn't need a ton of context, it's easy to test, easy to look at and understand, etc.

But outside of that, beyond needing to remember a certain syntax, I have found that any time I tried to use it for anything more complex I am finding myself spending more time going back and forth trying to get code that works than I would have if I had just done it myself in the first place.

If the code works, it just isn't maintainable code if you ask it to do too much. It will just remove entire functionality.

I have seen a situation of someone submitting a PR, very clearly copying a method and sticking it in AI and saying "improve this". It made changes for no good reason and when you ask the person that submitted the PR why they made the change we of course got no answer. (these were not just Linter changes)

Thats concerning, pushing code up that you can't even explain why you did something?

Like you said with the hard work, sure it can churn out code. But you need to have a complete clear picture of what that code needs to look like before you start generating or you will not like the end result.

I am not a software engineer and these tools allow me to make a giant mess of an app in a weekend that kind of does what I want but I only get that weekend. Once I come back to it after any length of time, since I have no idea what I am doing or what is going on it is impossible to update or add to the app without breaking it.

Now I have all these new ideas but I am back to square one because it just seems easier to start over.

I look forward to more powerful models in the future but I do wonder if it will just mean I can get slightly farther and make an even larger mess of an app in a weekend that I have no way to add to or update without breaking.

The main utility seems like it would be for content creation to pretend I made an app with all these great features as a non-software engineer but conveniently leave out the part about it being impossible to update.

They help me most when using a framework, API, or language I'm not super familiar with. Beats stack overflow for that stuff.

But it's weird to me seeing people talk about these changing their jobs so much. Maybe I'm holding it wrong but I'm almost always bottlenecked by "big picture" challenges and less on the act of actually typing the code.

My GH copilot kept recommending incorrect things when I use it along side common libraries and frameworks, I don't know, I just don't really find it very useful.

That's because programming isn't just about syntax. The real difficulty lies in the problem solving. Sure AI can help with that, but the problem still largely rests on the programmer, as AI won't do anything unless prompted to.

I think the non obvious benefit is using LLMs nudge you into putting your thoughts in narrative form and training that ability, something that someone with more experience does subconsciously.

How did it change your workflow? I'm also a developer with 10+ years of experience. Generating code is not what I miss in my daily life. Understanding what you're doing and creating a mental model of it is the difficult part. Typing in the code is easy. I'm not sure how would a coding assistant help with that.

I also find AI coding assistants useful, particularly when working on a framework heavy codebase (Django, Spring etc.). While I don’t use it to generate massive blocks of code from descriptions just the mostly accurate tab completion of boilerplate code probably saves me 10-20% of my coding time. It’s not a groundbreaking advancement but certainly a decent optimisation.

I have the same experience; I hope some open source/weights model gets to the same standard as I find a company like Anthropic having too much power to just shut out parts of the world (it did in the beginning when not available in the EU so I had to use proxies etc).

It is of vital importance (imho) to get open models at the same level before another jump comes (if it comes of course, maybe another winter, but at least we'll have something I use every day/all day; so not all hype I think).

BigCos won't let their employees use these tools to write code because you can't let code leak off prem. To keep from falling behind I use it on my weekend projects. I sense no urgency to remedy this inside, but I'm not sure how these tools would help with millions of lines of legacy code. They are awesome for creating new things or editing small things where you can build up context easily.

> BigCos won't let their employees use these tools to write code because you can't let code leak off prem.

Ridiculous blanket statement. A bunch of places use external LLMs.

I feel like currently the group feeling the most impact from Claude is frontend devs as tools like v0.dev, Claude etc. can almost create frontend from API schema (I've tested only page per page) which is great. It's probably because there's examples on the internet and that's why it works well.

I'm a dummy with very rudimentary script kiddie tier skills, and Claude has helped me do a bunch of personal projects that would normally take months of research and stackoverflow begging in a few days. For me the hype is real, but maybe not trillions $$$ worth of value real.

It helps me stay in flow by keeping me one layer up.

In pair programming, it's ideal to have a driver (hands on keyboard) and a navigator (planning, direction).

Claude can act as the driver most of the time so I can stay at the navigator level.

This is so helpful, as it's easy as programmers to get sucked into implementation details or low-level minutiae that's just not important.

I think tools like Copilot are best for people learning a popular programming language from scratch. You get better answers and easier problems to solve. The more experience you have the more you can outpace it in terms of correctness and appropriateness.

The most valuable thing AI has done for me with coding is commenting my code and creating documentation from it. It saves so much time doing a task that hardly any of us actually want to do - and it does it well.

> yet their company is barely even leveraging it

...do you not see the obnoxious CoPilot(TM) buttons and ads everywhere? It's even infected the Azure Portal - and every time I use it to answer a genuine question I have I get factually-incorrect responses (granted, I don't ask it trivial or introductory-level questions...).

As I mentioned in a sibling comment, I now "pair program" all day. Instead of being the driver and navigator all day, I can mostly sit "one layer up" in the navigator seat.

How is this a change in job description? It may be a failure of imagination on my part, but it sounds like you're still doing what you have always done - you've changed the how.

Right. And in a similar vein, its very easy to replace a lot of low-effort outsourcing / junior-dev assignable tasks. I still need to describe the problem and clean up the answer, but I get the answer immediately and I can re-prompt for it to fix it immediately.

I don't. But that's probably because I'm very opinionated about the implementation details, so I'm scrutinizing and tweaking its output a lot.

（评论） (comments)

（评论）
(comments)