构建内部代理：代码驱动与LLM驱动的工作流程

构建内部代理：代码驱动与LLM驱动的工作流程
Building an internal agent: Code-driven vs. LLM-driven workflows

原始链接: https://lethain.com/agents-coordinators/

这篇帖子详细介绍了Imprint构建内部工作流的方法，最初侧重于利用具有工具使用的LLM来处理复杂任务。作者最初认为LLM可以解决*任何*工作流，但发现有些问题用传统代码解决更好、更快、更便宜。为了解决这个问题，他们实现了一个系统，允许同时使用LLM驱动和代码驱动的工作流。一个中央“处理器”负责协调流程，默认使用LLM。但是，工作流可以切换到“脚本协调器”——运行自定义Python代码，完全访问与LLM相同的工具。这允许工程师为LLM可靠性受到质疑的任务构建确定性解决方案，例如准确识别合并的拉取请求（最初的LLM尝试导致了误报）。目前，他们从LLM开始，并在需要时逐步用代码进行增强。Claude Code 经常能成功地将LLM提示转换为可用的代码。结论是，即使LLM不断进步，有策略地将代码用于特定的、需要智能的任务仍然是一种有价值的长期策略。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录构建内部代理：代码驱动与LLM驱动的工作流 (lethain.com) 7点由 pavel_lishin 57分钟前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

When I started this project, I knew deep in my heart that we could get an LLM plus tool-usage to solve arbitrarily complex workflows. I still believe this is possible, but I’m no longer convinced this is actually a good solution. Some problems are just vastly simpler, cheaper, and faster to solve with software. This post talks about our approach to supporting both code and LLM-driven workflows, and why we decided it was necessary.

This is part of the Building an internal agent series.

Why determinism matters

When I joined Imprint, we already had a channel where folks would share pull requests for review. It wasn’t required to add pull requests to that channel, but it was often the fastest way to get someone to review it, particularly for cross-team pull requests.

I often start my day by skimming for pull requests that need a review in that channel, and quickly realized that often a pull request would get reviewed and merged without someone adding the :merged: reacji onto the chat. This felt inefficient, but also extraordinarily minor, and not the kind of thing I want to complain about. Instead, I pondered how I could solve it without requiring additional human labor.

So, I added an LLM-powered workflow to solve this. The prompt was straightforward:

Get the last 10 messages in the Slack channel
For each one, if there was exactly one Github pull request URL, extract that URL
Use the Github MCP to check the status of each of those URLs
Add the :merged: reacji to messages where the associated pull request was merged or closed

This worked so well! So, so well. Except, ahh, except that it sometimes decided to add :merged: to pull requests that weren’t merged. Then no one would look at those pull requests. So, it worked in concept–so much smart tool usage!–but in practice it actually didn’t solve the problem I was trying to solve: erroneous additions of the reacji meant folks couldn’t evaluate whether to look at a given pull request in the channel based on the reacji’s presence.

(As an aside, some people really don’t like the term reacji. Don’t complain to me about it, this is what Slack calls them.)

How we implemented support for code-driven workflows

Our LLM-driven workflows are orchestrated by a software handler. That handler works something like:

Trigger comes in, and the handler selects which configuration corresponds with the trigger
Handler uses that configuration and trigger to pull the associated prompt, load the approved tools, and generate the available list of virtual files (e.g. files attached to a Jira issue or Slack message)
Handler sends the prompt and available tools to an LLM, then coordinates tool calls based on the LLM’s response, including e.g. making virtual files available to tools. The handler also has termination conditions where it prevents excessive tool usage, and so on
Eventually the LLM will stop recommending tools, and the final response from the LLM will be used or discarded depending on the configuration (e.g. configuration can determine whether the final response is sent to Slack)

We updated our configuration to allow running in one of two configurations:

# this is default behavior if omitted
coordinator: llm

# this is code-driven workflow
coordinator: script
coordinator_script: scripts/pr_merged.py

When the coordinator is set to script, then instead of using the handler to determine which tools are called, custom Python is used. That Python code has access to the same tools, trigger data, and virtual files as the LLM-handling code. It can use the subagent tool to invoke an LLM where useful (and that subagent can have full access to tools as well), but LLM control only occurs when explicitly desired.

This means that these scripts–which are being written and checked in by our software engineers, going through code review and so on–have the same permission and capabilities as the LLM, although given it’s just code, any given commit could also introduce a new dependency, etc.

How’s it working? / Next steps?

Altogether, this has worked very well for complex workflows. I would describe it as a “solution of frequent resort”, where we use code-driven workflows as a progressive enhancement for workflows where LLM prompts and tools aren’t reliable or quick enough. We still start all workflows using the LLM, which works for many cases. When we do rewrite, Claude Code can almost always rewrite the prompt into the code workflow in one-shot.

Even as models get more powerful, relying on them narrowly in cases where we truly need intelligence, rather than for iterative workflows, seems like a long-term addition to our toolkit.