立刻从大型语言模型中离开!
Don’t let an LLM make decisions or execute business logic

原始链接: https://sgnt.ai/p/hell-out-of-llms/

不要将LLM用于核心应用逻辑。它们不擅长决策、状态管理和复杂的计算。可以将它们视为用户界面,将自然语言翻译成针对专业系统的API调用。 例如,与其让LLM管理国际象棋游戏,不如使用国际象棋引擎,让LLM只将用户输入翻译成国际象棋指令。这种方法可以提供更好的性能、更轻松的调试和更可靠的结果。 LLM擅长以下任务: * 将用户输入转换为结构化数据。 * 对用户请求进行分类并将其路由到相应的模块。 * 理解人类语言和上下文。 通过将LLM限制在这些角色中,您可以利用其优势,同时不会牺牲精度、可靠性和效率。随着LLM的改进,请记住,对于核心逻辑,专业系统可能仍然更易于管理、更具成本效益且更容易维护。

Hacker News上的一篇文章“尽快摆脱大型语言模型(LLM)”引发热议,文章反对将LLM用于核心业务逻辑。评论者们讨论了LLM的恰当用途,认为LLM擅长将非结构化数据转换为结构化数据以及处理前端任务。一位用户描述了使用LLM生成教育游戏的经历,但发现如果没有大量人工协调,LLM的效果不佳。其他人则警告不要将LLM用于需要精确性和确定性的关键任务。文章作者澄清了自己的观点:LLM不应该执行逻辑,这与一位用户发生了分歧,该用户成功地在数百万用户使用的系统中使用LLM实现了逻辑。这场讨论突出了LLM作为特定任务工具的潜在优势,例如生成可视化或用户界面,同时也强调了对核心功能进行人工监督和确定性代码的必要性。
相关文章

原文
Get out of there

Don’t let an LLM make decisions or implement business logic: they suck at that. I build NPCs for an online game, and I get asked a lot “How did you get ChatGPT to do that?” The answer is invariably: “I didn’t, and also you shouldn’t”.

In most applications, the LLM should be the user-interface only between the user and an API into your application logic. The LLM shouldn’t be implementing any logic. Get the hell out of the LLM as soon as possible, and stay out as long as you can.

Y Tho?

This is best illustrated by a contrived example: you want to write a chess-playing bot you access over WhatsApp. The user sends a description of what they want to do (“use my bishop to take the knight”), and the bot plays against them.

Could you get the LLM to be in charge of maintaining the state of the chess board and playing convincingly? Possibly, maybe. Would you? Hell no, for some intuitive reasons:

  • Performance: It’s impressive that LLMs might be able to play chess at all, but they suck at it (as of 2025-04-01). A specialized chess engine is always going to be a faster, better, cheaper chess player. Even modern chess engines like Stockfish that incorporate neural networks are still purpose-built specialized systems with well-defined inputs and evaluation functions - not general-purpose language models trying to maintain game state through text.
  • Debugging and adjusting: It’s impossible to reason about and debug why the LLM made a given decision, which means it’s very hard to change how it makes those decisions if you need to tweak them. You don’t understand the journey it took through the high-dimensional semantic space to get to your answer, and it’s really poor at explaining it too. Even purpose-built neural networks like those in chess engines can be challenging for observability, and a general LLM is a nightmare, despite Anthropic’s great strides in this area
  • And the rest…: testing LLM outputs is much harder than unit-testing known code-paths; LLMs are much worse at math than your CPU; LLMs are insufficiently good at picking random numbers; version-control and auditing becomes much harder; monitoring and observability gets painful; state management through natural language is fragile; you’re at the mercy of API rate limits and costs; and security boundaries become fuzzy when everything flows through prompts.

Examples

The chess example illustrates the fundamental problem with using LLMs for core application logic, but this principle extends far beyond games. In any domain where precision, reliability, and efficiency matter, you should follow the same approach:

  1. The user says they want to attack player X with their vorpal sword? The LLM shouldn’t be the system figuring out is the user has a vorpal sword, or what the results of that would be: the LLM is responsible for translating the free-text the user gave you into an API call only and translating the result into text for the user
  2. You’re building a negotiation agent that should respond to user offers? The LLM isn’t in charge of the negotiation, just in charge of packaging it up, passing it off to the negotiating engine, and telling the user about the result
  3. You need to make a random choice about how to respond to the user? The LLM doesn’t get to choose

Reminder of what LLMs are good at

While I’ve focused on what LLMs shouldn’t do, it’s equally important to understand their strengths so you can leverage them appropriately:

LLMs excel at transformation and at categorization, and have a pretty good grounding in “how the world works”, and this is where you in your process you should be deploying them.

The LLM is good at taking “hit the orc with my sword” and turning it into attack(target="orc", weapon="sword"). Or taking {"error": "insufficient_funds"} and turning it into “You don’t have enough gold for that.”

The LLM is good at figuring out what the hell the user is trying to do and routing it to the right part of your system. Is this a combat command? An inventory check? A request for help?

Finally, the LLM is good at knowing about human concepts, and knowing that a “blade” is probably a sword and “smash” probably means attack.

Notice that all these strengths involve transformation, interpretation, or communication—not complex decision-making or maintaining critical application state. By restricting LLMs to these roles, you get their benefits without the pitfalls described earlier.

The future

What LLMs can and can’t do is ever-shifting and reminds me of the “God of the gaps”. a term from theology where each mysterious phenomenon was once explained by divine intervention—until science filled that gap. Likewise, people constantly identify new “human-only” tasks to claim that LLMs aren’t truly intelligent or capable. Then, just a few months later, a new model emerges that handles those tasks just fine, forcing everyone to move the goalposts again, examples passim. It’s a constantly evolving target, and what seems out of reach today may be solved sooner than we expect.

And so like in our chess example, we will probably soon end up with LLMs that can handle all of our above examples reasonably well. I suspect however that most of the drawbacks won’t go away: your non-LLM logic that you pass off to is going to be easier to reason about, easier to maintain, cheaper to run, and more easily version-controlled.

Even as LLMs continue to improve, the fundamental architectural principle remains: use LLMs for what they’re best at—the interface layer—and rely on purpose-built systems for your core logic.

联系我们 contact @ memedata.com