大型语言模型会让人感到疲惫。
LLMs can be exhausting

原始链接: https://tomjohnell.com/llms-can-be-absolutely-exhausting/

## LLM 疲劳:可能在于你,而非模型 与 Claude 或 Codex 等大型语言模型 (LLM) 合作可能会令人惊讶地感到疲惫。作者发现,长时间的使用往往会导致沮丧和效率低下,最初会责怪模型本身——潜在的“变笨”、上下文限制或臃肿的系统。然而,核心问题通常在于*用户*的疲劳。 疲劳会直接影响提示质量,导致中断和负面反馈循环。缓慢的处理速度,特别是处理大型文件等任务时,会加剧这种情况,从而产生令人沮丧的缓慢迭代周期。 避免这种“恶性循环”的关键在于自我意识。当提示写作感觉勉强或不耐烦时,就应该意识到需要休息。不要依赖人工智能来填补思维中的空白,而应专注于编写清晰、自信的提示,并明确所需的成果。 至关重要的是,应将缓慢的反馈循环*视为问题本身*来解决。通过明确要求更快的迭代——模仿测试驱动开发——并提供清晰的失败案例,LLM 可以优化速度和效率,最终消耗更少的上下文并提供更智能的结果。最终,LLM 的成功取决于认识和减轻个人疲劳,并优先考虑快速、专注的工作流程。

## LLM 与开发者疲惫:总结 最近 Hacker News 上的一场讨论集中在与大型语言模型 (LLM) 进行编码工作时令人惊讶的疲惫感上。虽然 LLM 有潜力提高速度,但许多开发者发现持续的监督、提示和审查在精神上令人疲惫。 一些评论者提倡**异步工作流程**,禁用干扰性通知,并同时专注于少数任务,利用“等待时间”进行相关工作。关键在于避免追求 LLM 的最大效率,并允许它们有空闲时间。 另一些人指出,核心挑战不是编码本身(现在往往变得微不足道),而是**持续的决策**——引导 LLM 和规划项目。 一个反复出现的主题是**审查 LLM 生成的代码**的困难,尤其是在质量受损时。有些人觉得这比传统的编码更令人疲惫,因为人类仍然是瓶颈。 关于这是否是“技能问题”——需要学习更好的提示技巧——或当前 LLM 能力的根本限制,存在争论。 最终,这场讨论揭示了人工智能辅助开发的前景与管理其缺陷的现实之间的紧张关系。
相关文章

原文
LLMs can be absolutely exhausting | Tom Johnell

Some days I get in bed after a tortuous 4-5 hour session working with Claude or Codex wondering what the heck happened. It's easy to blame the model - there's so many options to choose from:

  1. They're dumbing down the model to save money.
  2. Context rot!
  3. Codex/Claude Code/[insert harness] is getting bloated.

It's not uncommon for me to come back to the problem the next day, my own context window cleared from rest, and find a fast and fulfilling path forward with the help of the LLM. What's going on?

I'm tired & experiments are too slow

As I get more tired, the quality of my prompts degrade

This one seems pretty obvious. If I am becoming mentally fatigued, I will write worse prompts, and because of that, the AI will do a worse job. Here's an example of what happens when I'm really tired: Kick off a somewhat meaty prompt (after 30% of context was used to align with the AI on the problem), realize right after submitting that I missed some key context, interrupt the LLM, provide the context, and then have it proceed. Without a doubt, interrupting Claude Code or "steering" in Codex leads to worse outcomes.

Feedback loop is too slow and context is bloated

Some of the work I'm doing right now requires parsing some large files. There's bugs in that parsing logic that I'm trying to work through with the LLM. The problem is, every tweak requires re-parsing and it's a slow process. I liken it to a slot machine that takes 10 minutes to spin. To add insult to injury, some of these tasks take quite a bit of context to get rolling on a new experiment, and by the end of the parsing job, the LLM is 2% away from compaction. That then leads to either a very dumb AI or an AI that is pretending to know what's going on with the recent experiment once it's complete.

The happy path with AI

Avoiding the doom-loop psychosis caused by bad prompting

If I reach the point where I am not getting joy out of writing a great prompt, then it's time to throw in the towel. That has to be the first signal. If I'm half-assing it, being short, interrupting, and getting frustrated, then time to take a break.

There's some metacognition that needs to take place here. Am I being less descriptive because I haven't actually thought through this problem and I'm hoping the AI will just fill the gaps? That can be a very seductive trap to fall into. AIs are getting quite good at filling in undefined requirements, something that I remember having to do as a software engineer myself, but they're not good enough yet.

There's times I write a prompt with so much clarity in my desired end-state that I'm already celebrating the end-result when I submit the prompt because I know the AI is going to CRUSH IT. That's the feeling I need to look for in every prompt. If it's more the feeling of unsureness or impatience, it's just not going to pan out.

Recognizing slow feedback loops and making those the problem

In the case of my parsing problem I mentioned above, it was too slow and the feedback loop was painful. I want my slot machine to take seconds/minutes to spin, not 15/20/30 minutes. In these cases, I've started to spin up a new session with the LLM, lay out my problem with feedback loop speed, express my desire to get to a sub 5-minute loop, give it an example of a failure case, and ask it to reproduce that failure case as quickly as possible. This is starting to sound familiar ... TDD anyone?

I was always the scrappy engineer. Sure I wrote tests, but I was never one to stop and create elaborate test cases or integration tests for bespoke problems. That was too time consuming, and also, I was getting paid even if my feedback loop wasn't perfect.

It's been quite the journey to fight that feeling that writing elaborate tests is time-consuming when working with AI. If you give an LLM clear success criteria: "Reproduce this specific failure case and make sure the clock time is less than 5 minutes to do it. Feel free to experiment with ways to optimize the code path or omit certain pieces that are unnecessary to reproduce" - the AI will not only reproduce the problem (maybe slowly the first time), but it will create levers for a faster feedback cycle. With that fast feedback cycle, it will consume less context and be SMARTER. This can seriously save hours of debugging time.

Conclusion

When I am exhausted from working with an LLM - it might actually be a "skill issue". I need to recognize when I'm tired and entering the doom-loop psychosis. Cognitive outsourcing of requirements is seductive, but it's a trap. If I'm not enjoying the act of writing the perfect prompt and absolutely confident I will return to a result I'm 95% happy with, I need to either take a break or ponder if I've really thought through the problem. If things are moving slow and it feels as though context is filling up too quickly - I need to make that the problem to solve. Find a path, with the help of the LLM, to iterate faster and use up less context.

Subscribe

You can subscribe to my blog via email or RSS feed.

联系我们 contact @ memedata.com