我们如何将v0打造成为一个高效的编码代理。

我们如何将v0打造成为一个高效的编码代理。
How we made v0 an effective coding agent

原始链接: https://vercel.com/blog/how-we-made-v0-an-effective-coding-agent

## 使用复合管道提高代码生成可靠性为了提高代码生成的成功率，尤其是在创建功能性网站方面，v0 采用了一个多步骤的“复合”管道，而不仅仅是依赖大型语言模型 (LLM)。该管道的核心在于主动解决 LLM 常见的失败点。三个关键组件驱动了这一改进：**动态系统提示词**、**LLM 悬念**和**自动修复程序**。动态提示词将最新的信息（如当前 SDK 版本）直接注入到 LLM 的上下文中，避免依赖可能过时的训练数据或网络搜索。**LLM 悬念**实时操作流式输出——纠正长 URL 或过时库引用等错误——而用户不会注意到中间问题。最后，**自动修复程序**通过解析代码并在初始生成*之后*应用确定性或模型驱动的修复，来解决更复杂的问题，例如缺少依赖项或语法错误。该管道明显提高了成功率——通常提高两位数百分比——通过在错误发生时检测和解决错误。通过在每个阶段解决特定的失败模式，v0 能够提供更可靠和功能性的代码生成，从而带来更流畅的用户体验。

## Vercel 的 v0 编码代理：摘要最近的 Hacker News 讨论集中在 Vercel 的“v0”AI 编码代理上。文章详细介绍了 Vercel 如何提高 v0 的有效性，重点关注两种关键策略：提示优化和提供示例代码。他们通过正则表达式替换来缩短 URL 等冗长输入，以减少 token 使用量并提高性能。此外，v0 还可以访问包含 Vercel 自身 SDK 精选代码示例的“只读文件系统”，使其能够识别和调整模式，用于图像生成和网页搜索集成等任务——甚至可以直接复制和修改这些示例。评论者们争论了这种方法，一些人批评它是一种“蛮力”解决方案，并质疑提示操作的工程质量。人们对示例代码的许可提出担忧，但 Vercel 的创建者澄清说，这些是他们自己的 SDK 和文档。另一些人赞扬 v0 生成视觉上吸引人的设计和原型，认为它优于 Claude Code 等替代方案，并且对灵感很有用。一些人注意到，开发正在从传统的编码方式转向 LLM 辅助开发，并对脆弱性和软件工程的未来表示担忧。

原文

Last year we introduced the v0 Composite Model Family, and described how the v0 models operate inside a multi-step agentic pipeline. Three parts of that pipeline have had the greatest impact on reliability. These are the dynamic system prompt, a streaming manipulation layer that we call “LLM Suspense”, and a set of deterministic and model-driven autofixers that run after (or while!) the model finishes streaming its response.

What we optimize for

The primary metric we optimize for is the percentage of successful generations. A successful generation is one that produces a working website in v0’s preview instead of an error or blank screen. But the problem is that LLMs running in isolation encounter various issues when generating code at scale.

In our experience, code generated by LLMs can have errors as often as 10% of the time. Our composite pipeline is able to detect and fix many of these errors in real time as the LLM streams the output. This can lead to a double-digit increase in success rates.

Link to headingDynamic system prompt

Your product’s moat cannot be your system prompt. However, that does not change the fact that the system prompt is your most powerful tool for steering the model.

For example, take AI SDK usage. AI SDK ships major and minor releases regularly. Models often rely on outdated internal knowledge (their “training cutoff”), but we want v0 to use the latest version. This can lead to errors like using APIs from an older version of the SDK. These errors directly reduce our success rate.

Many agents rely on web search tools for ingesting new information. Web search is great (v0 uses it too), but it has its faults. You may get back old search results, like outdated blog posts and documentation. Further, many agents have a smaller model summarize the results of web search, which in turn becomes a bad game of telephone between the small model and parent model. The small model may hallucinate, misquote something, or omit important information.

Instead of relying on web search, we detect AI-related intent using embeddings and keyword matching. When a message is tagged as AI-related and relevant to the AI SDK, we inject knowledge into the prompt describing the targeted version of the SDK. We keep this injection consistent to maximize prompt-cache hits and keep token usage low.

In addition to text injection, we worked with the AI SDK team to provide examples in the v0 agent’s read-only filesystem. These are hand-curated directories with code samples designed for LLM consumption. When v0 decides to use the SDK, it can search these directories for relevant patterns such as image generation, routing, or integrating web search tools.

These dynamic system prompts are used for a variety of topics, including frontend frameworks and integrations.

Link to headingLLM Suspense

LLM Suspense is a framework that manipulates text as it streams to the user. This includes actions like find-and-replace for cleaning up incorrect imports, but can become much more sophisticated.

Two examples show the flexibility it provides:

A simple example is substituting long strings the LLM often refers to. For example, when a user uploads an attachment, we give v0 a blob storage URL. That URL can be very long (hundreds of characters), which can cost 10s of tokens and impact performance.

Before we invoke the LLM, we replace the long URLs with shorter versions that get transformed into the proper URL after the LLM finishes its response. This means the LLM reads and writes fewer tokens, saving our users money and time.

In production, these simple rules handle variations in quoting, formatting, and mixed import blocks. Because this happens during streaming, the user never sees an intermediate incorrect state.

Suspense can also handle more complex cases. By default, v0 uses the lucide-react icon library. It updates weekly, adding and removing icons. This means the LLM will often reference icons that no longer exist or never existed.

To correct this deterministically, we:

Embed every icon name in a vector database.
Analyze actual exports from lucide-react at runtime.
Pass through the correct icon when available.
When the icon does not exist, run an embedding search to find the closest match.
Rewrite the import during streaming.

For example, a request for a "Vercel logo icon" might produce:

import { VercelLogo } from ‘lucide-react’

LLM Suspense will replace this with:

import { Triangle as VercelLogo } from ‘lucide-react’

This process completes within 100 milliseconds and requires no further model calls.

Link to headingAutofixers

Sometimes, there are issues that our system prompt and LLM Suspense cannot fix. These often involve changes across multiple files or require analyzing the abstract syntax tree (AST).

For these cases, we collect errors after streaming and pass them through our autofixers. These include deterministic fixes and a small, fast, fine tuned model trained on data from a large volume of real generations.

Some autofix examples include:

useQuery and useMutation from @tanstack/react-query require being wrapped in a QueryClientProvider. We parse the AST to check whether they're wrapped, but the autofix model determines where to add it.
Completing missing dependencies in package.json by scanning the generated code and deterministically updating the file.
Repairing common JSX or TypeScript errors that slip through Suspense transformations.

These fixes run in under 250 milliseconds and only when needed, allowing us to maintain low latency while increasing reliability.

Combining the dynamic system prompt, LLM Suspense, and autofixers gives us a pipeline that produces stable, functioning generations at higher rates than a standalone model. Each part of the pipeline addresses a specific failure mode, and together they significantly increase the likelihood that users see a rendered website in v0 on the first attempt.