不要浪费你的反压。
Provide agents with automated feedback

原始链接: https://banay.me/dont-waste-your-backpressure/

成功的AI代理应用现在依赖于建立“反压”——质量和正确性的自动化反馈循环。这使得代理能够通过自我纠正来处理更复杂、更长期的任务,而不是不断需要人工干预。 关键在于为代理配备超越基本编辑的工具;例如构建系统、类型检查、UI渲染比较(通过Playwright/Chrome DevTools等工具),甚至形式化验证系统如证明助手。这些工具使代理能够独立验证其工作并从错误中学习。 投资于强大的测试和反馈机制可以显著提高工程效率。工程师不必手动审查琐碎的错误(如缺少导入),而是可以专注于更高级的问题。具有强类型系统和有帮助的错误信息的语言尤其有价值,以及像规范驱动开发和自动文档生成用于比较等技术。 最终,将反压构建到工作流程中对于扩展代理的贡献并最大化其潜力至关重要。 否则,工程师可能会浪费宝贵的时间在重复的错误纠正上。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 不要浪费你的回压 (banay.me) 13 分,ghuntley 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

You might notice a pattern in the most successful applications of agents over the last year. Projects that are able to setup structure around the agent itself, to provide it with automated feedback on quality and correctness, have been able to push them to work on longer horizon tasks.

This back pressure helps the agent identify mistakes as it progresses and models are now good enough that this feedback can keep them aligned to a task for much longer. As an engineer, this means you can increase your leverage by delegating progressively more complex tasks to agents, while increasing trust that when completed they are at a satisfactory standard.

Imagine for a second if you only gave an agent tools that allow it to edit files. Without a way to interact with a build system the model relies on you for feedback about whether or not the change it made is sensible. This means you spend your back pressure (the time you spend giving feedback to agents) on typing a message telling the agent it missed an import. This scales poorly and limits you to working on simple problems.

If you’re directly responsible for checking each line of code produced is syntactically valid, then that’s time taken away from thinking about the larger goals or problems in your software. You’re going to struggle to derive more leverage out of agents because you are caught up in trivial changes. If instead you give the agent tools that allow it to run bash commands, it can run a build, read the feedback, and correct itself. You remove yourself from needing to be involved in those tasks and can instead focus on higher complexity tasks.

Languages with expressive type systems have been growing in popularity in part because of back pressure. Type systems allow you to describe better contracts in your program. They can let you avoid it from even being possible to represent invalid states in your program. They can help you to identify edge cases that you might not handle. Being able to lean on these features is another form of creating back pressure which you can direct as feedback on changes made by an agent.

Bonus points go to languages that work to produce excellent error messages (think Rust, Elm and even Python). These messages are fed directly back into the LLM so the more guidance or even suggested resolutions the better.

Another example of back pressure is the rapid uptake in people giving agents a way to see rendered pages using MCP servers for Playwright or Chrome DevTools. In either case these tools give the agent a way to be able to make a change and compare an expectation of what it might see in the UI against a result. Attaching these tools mean you remove yourself from needing to keep telling the agent that you’re not seeing a UI element load correctly or something isn’t centered. Not working on a UI application? Use MCP servers that bridge to LSPs for lints or other feedback.

Even outside of engineering tasks, proof assistants like Lean combined with AI (see recent work on the Erdős Problems which was solved by Kevin Barreto and Liam Price by using Aristotle to formalise a proof written by GPT-5.2 Pro into Lean), randomized fuzzing to evaluate correctness when generating CUDA kernels or logic programming with agents are all powerful combinations because they let you keep pulling the LLM slot machine lever until the result you have can be trusted. I think that the payoff of investing into higher quality testing is growing massively, and an increasing part of engineering will involve designing and building back pressure in order to scale the rate at which contributions from agents can be accepted.

If you’re doing spec-driven development and you want the agent to generate a specific API schema, setup automatic generation of documentation based on the OpenAPI schema from your application so the agent can compare the result it produced and what it intended on making. There are many more techniques you can apply similar to this once you recognize the pattern.

In your projects you should think about how you can build back pressure into your workflow and once you have it, you can loop agents until they have stamped out all of the inconsistencies and issues for you. Without it, you’re going to be stuck spending your time telling the agent about each mistake it makes yourself.

So next time, think - are you wasting your back pressure?

联系我们 contact @ memedata.com