我将 Markdown 变成生成式 UI 的协议。

我将 Markdown 变成生成式 UI 的协议。
I turned Markdown into a protocol for generative UI

原始链接: https://fabian-kuebler.com/posts/markdown-agentic-ui/

## 代理UI：用户界面新范式埃里克·施密特预测传统UI将衰落，设想未来AI代理将根据需要动态生成界面。最近的一个原型探索了这个概念，构建了一个能够从头开始使用一种新方法创建React UI的代理AI助手。核心思想在于**Markdown作为协议**：单一数据流承载文本、可执行代码（在代码围栏内）和数据（在数据围栏内）。这利用了LLM对Markdown的现有理解，避免了新的训练需求。**流式执行**允许代码在生成时增量运行，提高响应速度。一个`mount()`原语促进了反应式UI的创建，并管理客户端、服务器和LLM之间的数据流。该系统支持四种数据流模式：客户端到服务器（表单）、服务器到客户端（实时更新）、LLM到客户端（流式数据）和客户端到服务器（回调）。对于复杂的UI，**插槽机制**允许初始骨架界面随着可用内容而填充。虽然安全性并未直接解决（依赖现有的沙盒技术），但该原型证明了通过将系统与LLM的现有知识库（Markdown、TypeScript和React）对齐，而不是要求新的学习，来构建功能UI的可行性。该项目的成功凸显了利用现有的LLM训练数据来创建新一代动态、代理驱动的界面的潜力。

法比安·卡博纳拉推出了“fenced”，一个原型，探索了一种新的生成UI架构，将**Markdown视为一种协议**，用于组合文本、可执行代码和数据。核心思想是**流式执行**——Markdown代码围栏内的代码会随着接收到每一条语句而运行，从而实现动态UI创建。一个关键组件是`mount()`原语，它允许一个代理使用完整的客户端、服务器和LLM之间的数据流来构建React UI。这种方法提供了表达力的范围，从为安全起见预先注册的UI块到为了最大的灵活性而进行完全代码执行。 Hacker News的讨论引发了关于命名的争论（有人建议使用“超文本”），以及替代方案，如Markdown UI和MDX，以及潜在的应用范围，从交互式仪表盘到可定制的笔记本。许多评论员强调了类似的正在进行的工作，特别是与Claude Code的新“通道”功能，以及支撑此类系统所需的强大的数据模型。

原文

“User interfaces are largely going to go away,” Eric Schmidt predicts. Agents will generate whatever UI you need on the fly. I built a prototype to explore the premise.

That’s an agentic AI assistant generating React UIs from scratch, with data flowing between client, server, and LLM. The prototype rests on three ideas:

Markdown as protocol — One stream carrying text, executable code, and data. The LLM already knows how to write it.
Streaming execution — The agent writes and executes code. Each statement executes as soon as it’s complete — no waiting for the full response.
A mount() primitive — One function that lets the agent create reactive UIs, with data flow patterns for client-server-LLM communication.

Check out the repo here.

The Protocol #

How do you combine code execution with text and data? All streamed and interleaved in arbitrary order? In a single protocol?

I kept coming back to markdown. LLMs know markdown cold — formatting, code fences, all of it. Why teach them something new?

So I settled on three block types:

Block	Syntax	Purpose
Text	`Plain markdown formatting`	Streams to the user
Code fence	```tsx agent.run	Executes on the server in a persistent context
Data fence	```json agent.data => "id"	Streams data into the UI

Here’s what this might look like:

Hey! I am the assistant. This text is streamed to the user token by token.
But I can also run code...

```tsx agent.run
const messages = await fetchMessages()
```

I can mount UIs

```tsx agent.run
const fakeMovieData = new StreamedData("fake-movies");

const form = mount({
  streamedData: fakeMovieData,
  ui: ({ streamedData }) => <Movies movies={streamedData} />
})
```

I can stream data into these UIs [data appears one by one...]

```json agent.data => "fake-movies"
[
  { "name": "Blade Runner", "rating": 4.5 },
  { "name": "Dune", "rating": 4.2 }
]
```

All within the same response...

Text, code, and data—interleaved, in any order, any number of times. The parser handles it incrementally as tokens arrive.

And the syntax is naturally extensible. Need a new block type? Just add a new fence header. tsx agent.run and json agent.data are just the first two.

The Feedback Loop #

The feedback loop is simple: console.log is how the agent talks to itself. It works like this:

LLM generates markdown with code blocks
Text streams to the user, code executes incrementally on the server
console.* output and exceptions feed back to the LLM as a new turn
If there’s no output or exceptions — done, wait for a new user query

This means the agent can react to its own execution:

How many messages did I get?

```tsx agent.run
const messages = await fetchMessages();
console.log('messagesCount:', messages.length);
```

[runtime transcript]
messagesCount: 4

You have four new messages.

Or it can pause and wait for user input:

```tsx agent.run
const form = mount({ /* ... */ });
const answer = await form.result;  // Blocks until user submits
console.log("user:responded", answer);
```

Streaming Execution #

I wanted statements to execute as the LLM generated them, without waiting for the code fence to close. The result would be a more responsive user experience—API calls start, UI renders, errors surface, all while the LLM is still sending tokens.

The problem: streaming execution isn’t a standard primitive yet. No runtime lets you feed in tokens and execute statements as they complete, with shared context and top-level await.

I ended up building bun-streaming-exec to handle this, using vm.Script with some “creative” wrapping. I wrote a dedicated article about the approach if you want the deep dive.

Is it cursed? Yes. Works? Mostly.

Agentic UI #

With text, code, and data in one stream, you have most of the building blocks for agentic UI. The missing piece is a way to turn code into live interfaces. For UI, React is the obvious choice. LLMs have seen millions of React components. They know JSX.

The core primitive is mount():

```tsx agent.run
mount({
  ui: () => <Card>Hello from the agent!</Card>
});
```

The LLM generates the code, the server executes it. mount() serializes the React component and sends it over the wire. The client renders it inside the chat.

The real power comes from data flow, though.

Four Ways Data Can Move #

Building this, I ended up with four distinct patterns for moving data between server, client, and LLM:

1. Client → Server (forms)

The agent can wait for user input:

```tsx agent.run
const form = mount({
  outputSchema: z.object({ name: z.string().min(1) }),
  ui: ({ output }) => (
    <Box>
      <TextField {...output.name} label="Your name" />
      <Button type="submit" {...output}>Submit</Button>
    </Box>
  )
});
const { name } = await form.result;  // Blocks until submit
console.log("user:responded", name);
```

{...output.name} wires up the field. await form.result pauses execution until the user submits. The result feeds back to the LLM via console.log.

2. Server → Client (live updates)

Server-side mutations transparently update the UI:

```tsx agent.run
const data = new Data({ progress: 0 });
mount({
  data,
  ui: ({ data }) => <LinearProgress value={data.progress} />
});

data.progress = 40;  // UI updates immediately
```

Under the hood, Data objects are proxies. Mutations are detected, serialized as patches, sent over WebSocket, and applied on the client.

3. LLM → Client (streaming)

The LLM can stream JSON directly into the UI:

```tsx agent.run
const movies = new StreamedData("movies-list");
mount({
  streamedData: movies,
  ui: ({ streamedData }) => (
    <Card>
      {streamedData?.map((movie, i) => (
        <Recommendation key={i} rating={movie.rating}>{movie.name}</Recommendation>
      )) ?? <Loading />}
    </Card>
  )
});
```

```json agent.data => "movies-list"
[
  { "name": "Blade Runner", "rating": 4.5 },
  { "name": "Dune", "rating": 4.2 }
]
```

The JSON streams token-by-token. The client parses incrementally using jsonriver, updating the UI as data arrives. Once complete, the server can access it too via the StreamedData object.

4. Client → Server (callbacks)

For live interactions inside the UI:

```tsx agent.run
const data = new Data({ messages, loading: false });
const onRefresh = async () => {
  data.loading = true;
  data.messages = await loadMessages();
  data.loading = false;
};

mount({
  data,
  callbacks: { onRefresh },
  ui: ({ data, callbacks }) => (
    <Card>
      <MessageList messages={data.messages} />
      <RefreshButton loading={data.loading} onClick={callbacks.onRefresh} />
    </Card>
  )
});
```

Clicking the button invokes a server-side function. The callback fetches fresh data, updates state, and the UI reflects it — all in code, without triggering a new LLM turn.

Slots #

As UIs get more complex, the user has to wait longer for the LLM to generate the code. For more elaborate UIs, there’s a slot mechanism: the agent can mount a skeleton interface first and then inject the heavier sections later.

Combined with streaming execution, the skeleton appears the moment its mount() statement completes. Each mountSlot() call fills in a section as soon as the LLM finishes generating it:

```tsx agent.run
const shell = mount({
  data, callbacks: { onResolve },
  ui: () => (
    <Card>
      <Slot name="stats" fallback={<Skeleton variant="rectangular" height={120} />} />
      <Slot name="blockers" fallback={<Skeleton variant="rectangular" height={80} />} />
    </Card>
  ),
});

shell.mountSlot("stats", ({ data }) => <StatsRow data={data} />);
shell.mountSlot("blockers", ({ callbacks }) => <BlockerList onResolve={callbacks.onResolve} />);
```

Slots share the same context as their parent: data, callbacks, streamed data. This means slots stay reactive across each other. A callback in one slot can mutate shared data, and every other slot that reads it updates automatically.

On Security #

Both Claude Code and ChatGPT’s Code Interpreter already execute LLM-generated code at scale — sandboxing, capability-based permissions, and static analysis are under active development across the industry. The hard unsolved problem is prompt injection, and that cuts across all agent architectures equally — tool calling, MCP, and code execution alike. This project doesn’t tackle any of that. It explores the layer above: what you can build once you assume security is reasonably solved. We’re not fully there yet.

Why It Works #

I built this prototype to see if markdown could actually work as a protocol for agentic UI without any finetuning. When I let it run the first time, I was surprised. The model picked it up immediately. It was not perfect. But the core idea just worked.

That’s because every design choice here optimizes for one thing: LLM ergonomics.

Markdown with code fences because LLMs have trained on billions of docs. TypeScript because it bridges server and client in the most-used language on GitHub. React because it’s the UI framework they know best. mount() because its building blocks — awaitable results, callbacks, Zod schemas — are patterns the model has seen millions of times.

The system doesn’t teach the model anything new. It arranges patterns the model already knows into a system that actually runs.

You could design a new protocol for agentic UI from scratch. Or you could just match the runtime to the model’s training data: markdown.

Check out the repo here.