停止使用自然语言界面
Stop using natural language interfaces

原始链接: https://tidepool.leaflet.pub/3mcbegnuf2k2i

## LLM 接口中的速度与灵活性平衡 大型语言模型 (LLM) 提供了强大的自然语言交互能力,但其缓慢的推理速度(数十秒)与传统的图形用户界面 (GUI) 相比造成了显著的延迟。这并非避免使用 LLM 的理由,而是需要构建更智能的界面,利用两种方法的优势。 作者开发了“popup-mcp”工具,该工具能够使 LLM 动态生成包含结构化 GUI 元素(如复选框、下拉菜单和滑块)的弹出窗口。这允许快速的毫秒级响应交互,从而大大降低*平均*延迟——交换信息所需的总时间。 主要功能包括元素的条件可见性(创建对话树)和用于用户澄清的自动“其他”选项,为 LLM 误解意图时提供退出机制。这种方法可以通过用快速的 GUI 选择代替多次聊天回合,将交互时间缩短 25-75%。 类似的功能存在于 Claude 的代码规划模式等工具中,但“popup-mcp”通过条件元素对此进行了扩展。作者提倡将此类结构化 GUI 集成到 LLM 聊天应用程序中,以提供响应迅速且灵活的用户体验。

停止使用自然语言接口 (leaflet.pub) 6点 由 steveklabnik 2小时前 | 隐藏 | 过去 | 收藏 | 1条评论 dhruv3006 18分钟前 [–] 我同意这个观点。看看是否越来越多的人会接受这种理念会很有意思。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Natural language is a wonderful interface, but just because we suddenly can doesn't mean we always should. LLM inference is slow and expensive, often taking tens of seconds to complete. Natural language interfaces have orders of magnitude more latency than normal graphic user interfaces. This doesn't mean we shouldn't use LLMs, it just means we need to be smart about how we build interfaces around them.

The Latency Problem

There's a classic CS diagram visualizing latency numbers for various compute operations: nanoseconds to lock a mutex, microseconds to reference memory, milliseconds to read 1 MB from disk. LLM inference usually takes 10s of seconds to complete. Streaming responses help compensate, but it's slow.

Compare interacting with an LLM over multiple turns to filling in a checklist, selecting items from a pulldown menu, setting a value on a slider bar, stepping through a series of such interactions as you fill out a multi-field dialogue. Graphic user interfaces are fast, with responses taking milliseconds, not seconds. But. But: they're not smart, they're not responsive, they don't shape themselves to the conversation with the full benefits of semantic understanding.

This is a post about how to provide the best of both worlds: the clean affordances of structured user interfaces with the flexibility of natural language. Every part of the above interface was generated on the fly by an LLM.

Popup-MCP

This is a post about a tool I made called popup-mcp (MCP is a standardized tool-use interface for LLMs). I built it about 6 months ago and have been experimenting with it as a core part of my LLM interaction modality ever since. It's a big part of what has made me so fond of them, from such an early stage. Popup provides a single tool that when invoked spawns a popup with an arbitrary collection of GUI elements.

You can find popup here, along with instructions on how to use it. It's a local MCP tool that uses stdio, which means the process needs to run on the same computer as your LLM client. Popup supports structured GUIs made up of elements including multiple choice checkboxes, drop downs, sliders, and text boxes. These let LLMs render popups like the following:

The popup tool supports conditional visibility to allow for context-specific followup questions. Some elements start hidden, only becoming visible when conditions like 'checkbox clicked', 'slider value > 7', or 'checkbox A clicked && slider B < 7 && slider C > 8' become true. This lets LLMs construct complex and nuanced structures capturing not just their next stage of the conversation but where they think the conversation might go from there. Think of these as being a bit like conditional dialogue trees in CRPGs like Baldur's Gate or interview trees as used in consulting. The previous dialog, for example, expands as follows:

Because constructing this tree requires registering nested hypotheticals about how a conversation might progress, it provides a useful window into an LLM's internal cognitive state. You don't just see the question it wants to ask you, you see the followup questions it would ask based on various answer combinations. This is incredibly useful and often shows where the LLM is making incorrect assumptions. More importantly, this is fast. You can quickly explore counterfactuals without having to waste minutes on back-and-forth conversational turns and restarting conversations from checkpoints.

Speaking of incorrect LLM assumptions: every multiselect or dropdown automatically includes an 'Other' option, which - when selected - renders a textbox for the user to elaborate on what the LLM missed. This escape hatch started as an emergent pattern, but I recently modified the tool to _always_ auto-include an escape hatch option on all multiselects and dropdown menus.

This means that you can always intervene to steer the LLM when it has the wrong idea about where a conversation should go.

Why This Matters

Remember how I started by talking about latency, about how long a single LLM response takes? This combination of nested dialogue trees and escape hatches cuts that by ~25-75%, depending on how well the LLM anticipates where the conversation is going. It's surprising how often a series dropdown with its top 3-5 predictions will contain your next answer, especially when defining technical specs, and when it doesn't there's always the natural-language escape hatch offered by 'Other'.

Imagine generating a new RPG setting. Your LLM spawns a popup with options for the 5 most common patterns, with focused followup questions for each.

This isn't a generic GUI; it's fully specialized using everything the LLM knows about you, your project, and the interaction style you prefer. This captures 90% of what you're trying to do, so you select the relevant options and use 'Other' escape hatches to clarify as necessary.

These interactions have latency measured in milliseconds: when you check the 'Other' checkbox, a text box instantly appears, without even a network round-trip's worth of latency. When you're done, your answers are returned to the LLM as a JSON tool response.

You should think of this pattern as providing a reduction in amortized interaction latency: it'll still take 10s of seconds to produce a followup response when you submit a popup dialog, but if your average popup replaces > 1 rounds of chat you're still taking less time per unit of information exchanged. That's what I mean by amortized latency: that single expensive LLM invocation is amortized over multiple cheap interactions with deterministically rendered GUI run on your local machine.

Claude Code Planning Mode

I started hacking on this a few months before Claude Code released their AskUser tool (as used in planning mode). The AskUser tool provides a limited selection of TUI (terminal user interface) elements: multiple-choice and single-choice (with an always-included ‘Other’ option) and single-choice drop-downs. I originally chose not to publicize my library because of this, but I believe the addition of conditional elements is worth talking about.

Further, I have some feature requests for Claude Code. If anyone at Anthropic happens to be reading this these would all be pretty easily to implement:

  • Make the TUI interface used by the AskUserQuestion tool open and scriptable, such that plugins and user code can directly modify LLM-generated TUI interfaces, or directly generate their own without requiring a round-trip through the LLM to invoke the tool.

  • Provide pre and post-AskUser tool hooks so users can directly invoke code using TUI responses (eg filling templated prompts using TUI interface responses in certain contexts).

  • Extend the AskUser tool to support conditionally-rendered elements.

Conclusion

If you have an LLM chat app you should add inline structured GUI elements with conditionally visible followup questions to reduce amortized interaction latency. If you'd like to build on my library or tool definition, or just to talk shop, please reach out. I'd be happy to help. This technique is equally applicable to OS-native popups, terminal user interfaces, and web UIs.

I'll be writing more here. Publishing what I build is one of my core resolutions for 2026, and I have one hell of a backlog. Watch this space.

联系我们 contact @ memedata.com