与大型语言模型合作时的反模式
Anti-patterns while working with LLMs

原始链接: https://instavm.io/blog/llm-anti-patterns

## LLM 反模式:经验教训 在与大型语言模型 (LLM) 合作了 15 个月后,出现了一些适得其反的模式。**首先**,避免冗余上下文;LLM 的“记忆”有限。发送重复的、几乎相同的信息(例如连续的截图)会浪费 token 并降低性能。 **其次**,发挥 LLM 的优势。不要要求它做它*不*擅长的事情——比如直接计数——而是利用它的编码能力来*生成*解决方案。通过代码进行工具调用也被证明比直接提问更可靠。 **第三**,避免用过多的上下文(超过约 128k token)使 LLM 不堪重负。随着模型难以管理信息,准确性会下降,可能会“忘记”关键细节。 **第四**,由于训练数据的限制,LLM 在处理晦涩或最近发明的课题时会遇到困难。预计准确性会降低,并相应地进行补偿。 **最后**,保持积极的监督——不要成为“感觉编码员”。密切监控 LLM 的输出,因为它可能会引入细微的错误或安全漏洞(例如泄露敏感数据),如果无人检查,则可能发生。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 反模式在使用LLM时 (instavm.io) 7点 由 mkagenius 1小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

After working with LLMs for the last 15 months, these are some of the anti-patterns I have discovered.

By anti-patterns, I simply mean patterns or behaviors we should avoid when working with LLMs.

1. Did I tell you that already?

Context is a scarce resource and probably worth its weight in gold, we need to use it wisely. One of the learnings is to not send the same information/text multiple times in the same session.

For example, during computer-use sending each and every image frame when a mouse is going from point A to point B on the screen as screenshots with barely anything changing between a lot of consecutive frames (mouse pointer moving 1 millimeter for example) in each API call, when just one new and final screenshot showing current context is enough.

It's sort of an irony that the same company has come up with a context management tool/api, which helps you reduce/compress the context by removing redundant messages while it did exact opposite for computer-use and sent all previous almost duplicated screenshots in every new LLM api call again. We built open-source click3 which does it without sending any possibly duplicate screenshots in API calls - screenshots with significant differences (or taken at state changes) are enough for the LLM to decide next course of action.

2. Asking a fish to climb a tree

Should we ask the fish to climb a tree? Sure sometimes they can climb a tree, but better ask them do things they are good at. For example, asking Gemini Banana to generate an image on a wooden plank with a text starting with prefix 1AA..(notice the double A) always ended up with 1A.. (single A) after 13 tries or so, i decided to give up. Later, I had an idea - to write the text in a google doc, take its picture and then give the picture and ask it to merge it on a wooden plank picture (also given by me) -- It did it in 1 shot.

Similarly we should not ask LLMs how many Rs are there in BLUEBERRY - we should ask it to write a code which counts the Rs. Coding ability > Counting ability - atleast for the current LLMs.

Take another example, Cloudflare recently realised that tool calling is better when its written as code that calls them. So, it seems we should ask it to generate code whenever we expect more accurate answers.

The climbing perch - A tree climbing fish

3. Asking LLM to speak, when its drowning (in context)

LLMs do best when it's not nearly full with 128k tokens. For long running sessions, which go beyond the 128k token count - it can be even worse, we then depend on the ability of the Claude to compress or discard information based on its whim. For example, the other day, it completely forgot about a database connection URL I had given it and started spitting someone else's database URL in the same session. Thankfully(for them) that URL didn't work. Unfortunately, some tasks do need big contexts, my only advice in that case is to be aware of its accuracy decline.

Some random database url, from its memory

4. The squeaky wheel gets the grease

LLMs don't perform well on obscure topics. Similarly and as expected, on topics which were invented after their training cut-off dates, for the simple reason of them not being trained on those topics. They perform well on topics which have been widely discussed. So if your topic is an obscure one, assume less accuracy and figure out ways to make it accurate. Here is an instance of Claude-CLI giving up on Stripe integration which btw has one of the nicest documentation -

5. You don't want to be a vibe-coder

It's easy to slip into a manager (or as Andrej Karpathy calls it - a vibe-coder) mode with Claude Code like tool but in my observation if you lose the sight of what the LLM is writing, it will eventually be a net loss. Never lose the thread of what's going on. For example, in the /invoices api, Claude decided it was fine to put the User object in the response json, since it is part of the invoice object. Only I could see it was exposing the password_hash unnecessarily. Although not a security issue immediately, but if something goes wrong, and the attackers gets access to the invoice jsons, this will only help the attackers get more important information. Or imagine someone not even hashing the password and getting exposed. You get the point.

References:

  1. https://www.anthropic.com/news/context-management

  2. https://github.com/instavm/clickclickclick

  3. https://blog.cloudflare.com/code-mode/

  4. https://github.com/instavm/coderunner

联系我们 contact @ memedata.com