人工智能的速度应该与我们适应的速度相匹配。
AI should only run as fast as we can catch up

原始链接: https://higashi.blog/2025/12/07/ai-verification/

## AI 的可靠性挑战:从“氛围编码”到验证工程 最近的经验表明,人工智能快速发展面临一个关键挑战:确保*可靠的*工程实践。虽然人工智能擅长快速生成代码和原型——例如 Gemini 等工具所展示的——但其效用取决于我们验证其输出的能力。 一位名为 Eric 的产品经理发现,Gemini 可以创建外观令人印象深刻的网络应用程序,但缺乏构建生产级产品的技术深度。相反,一位名为 Daniel 的高级工程师成功地将人工智能用作“力量倍增器”,通过提示它在现有且充分理解的框架内构建组件,然后快速验证结果。 这种差异说明了一个关键点:人工智能的速度只有在我们能够跟上验证速度时才有价值。验证快速且直观的任务(例如评估人工智能生成的图像)非常适合被颠覆。然而,需要深入专业知识的复杂任务可能会积累“验证债务”——即未经验证的人工智能生成工作的积压。 作者提出“验证工程”是继提示工程和上下文工程之后下一个关键领域。这侧重于通过更好的提示、培训相关人员以及识别易于验证的任务来简化验证流程。最终,最大限度地发挥人工智能的潜力需要优先考虑健全的验证方法,以确保可靠性和可追溯性。

相关文章

原文

The story of Daniel and Eric

Recently I have spoke with two of my friends who all had fun playing with AI.

Last month, I met with Eric, a fearless PM at a medium size startup who recently got into vibe coding with Gemini.

After getting familiarized with Gemini, Eric was genuinely amazed by how AI quickly turns prompt into playable web applications. It served great purpose as a first prototype to communicate ideas to designers and engineers. But Eric really wanted to skip those steps and directly ship it to prod. But he couldn’t really understand that Gemini actually built a single-page HTML file that merely looks like a working app. Sadly, one cannot build a reliable enterprise product out of this. And there is really no effective way for Eric to catch up on these technical details and outpace the engineering team himself.

Last week, I had coffee with Daniel, a senior staff engineer who recently grew fond of AI coding and found it to be the true force multiplier.

Daniel was skeptical of AI at first, but lately he hasn’t wrote a single line of code for months already. What he does is just precisely prompt the AI to create new components in an existing framework (involving Kafka, postgres, AuthN/Z, and k8s infra stuff) and adhering to certain preexisting paradigms. He would just spot-check the correctness of AI’s work and quickly spin up local deployments to verify it’s indeed working. Later, he pushes the changes through code review process and lands those features. All without writing a single line of code and it’s production ready just as if he wrote them himself. To Daniel, building and shipping things fast and scalable is simpler than ever.

Interpolating between the two stories

After speaking with Eric and Daniel, I suddenly feel that there is an overarching theme around the use of AI that we can probably interpolate out of the stories here. And after pondering for a weekend, I think I can attempt to describe it now: it’s the problem of reliable engineering - how can we make AI work reliably.

With the AI superpower, one can task it to do all crazy things on the internet with just typing a few lines of prompt. AI always thinks and learns faster than us, this is undeniable now. However, to make AI work actually useful (not only works, but reliable and trustworthy), we also need to catch up with what the AI does as quickly as possible.

It’s almost like - we need to send the AI off to learn and think as fast as possible, but we also need to catch up as soon as possible to make it all relevant. And the speed we catch up things is critical to whether AI can help us effectively do these tasks. For the case of Daniel, he can spot-check and basically just skim through AI’s work and know for sure it’s doing the right thing with a few simple tests steps to verify, hence his results are more reliable. Whereas for Eric, he needs to basically learn software development from the bottom up to comprehend what the AI has done, and that really doesn’t give him the edge to outpace engineering teams to ship features reliably by himself.

Where AI exploded: fast verification, slow learning and creation

To generalize the problem again, I think for all the tasks we do, we can break them down into two parts: learning/creation and verification. Basically doing the task and checking if the task is done right. Interestingly, this gives us a good perspective to our relationship with AI on performing such tasks.

Effort wise, if verification « learning/creation, one can very effectively check AI’s work and be confident about its reliability.

If verification ~= learning/creation, one spends equal amount of time checking AI’s work. It’s not a big win, maybe AI becomes a good automation script to cut down some boilerplate.

If verification » learning/creation, one cannot be sure about AI’s work that easily, and we are in the vibe-land.

A very good example of the first category is image (and video) generation. Drawing/rendering a realistic looking image is a crazily hard task. Have you tried to make a slide look nicer? It will take me literally hours to center the text boxes to make it look “good”. However, you really just need to take a look at the output of Nano Banana and you can tell if it’s a good render or a bad one based on how you feel. The verification is literally instantaneous and effortless because it’s all encoded as feeling or vibes in your brain. “Does this look right?” probably can be answered in the span of milliseconds by your vision cortex. There is also no special knowledge required - human beings have been evaluating visual images since birth, hardwired into our instincts.

The significant cost asymmetry can greatly explain why AI image generation exploded. If we can look for similar scenarios, we can probably identify other “killer” use cases of AI as well.

Verification debt: scarier than tech debt

However, if we go down into the bottom of the spectrum where verification becomes more intense - requiring domain knowledge, technical expertise, industry know-hows to tell if the AI is producing slop or not, we will enter this dark age of piling verification debt. More things are being created, but we are lagging behind to check if any of it actually works to our satisfaction.

If an organization keeps vibe-coding without catching up with verification, those tasks can quickly end up as “debts” that needs to be verified. When verification becomes the bottleneck, dangerous things can happen if we still want to move fast - we will risk ourselves running unverified code and having unexpected side effects that are yet to be validated. It can also apply to other fields - imagine asking AI to craft a new vaccine and you don’t want to wait for FDA to use it.

I’ve come across a few blog posts that talks about Verification Debt already. I think it’s genuinely a good problem for technical leaders to have in their mind in this era.

Verification Engineering is the next Context Engineering

AI can only reliably run as fast as we check their work. It’s almost like a complexity theory claim. But I believe it needs to be the case to ensure we can harvest the exponential warp speed of AI but also remain robust and competent, as these technologies ultimate serve human beings, and us human beings need technology to be reliable and accountable, as we humans are already flaky enough ;)

This brings out the topic of Verification Engineering. I believe this can be a big thing after Context Engineering (which is the big thing after Prompt Engineering). By cleverly rearranging tasks and using nice abstractions and frameworks, we can make verification of AI performed tasks easier and use AI to ship more solid products the world. No more slop.

I can think of a few ideas to kickoff verification engineering:

  • How to craft more technicall precise prompts to guide AI to surgically do things, rather than vibing it.
  • How to train more capable technical stakeholders who can effectively verify and approve what AI has done.
  • How to find more tasks that are relatively easy to verify but rather hard to create.
  • How to push our theoretical boundaries of what things we can succinctly verify (complexity theory strikes again).

Where next

I believe whoever figures out ways to effectively verify more complex tasks using human brains, can gain the most benefit out of the AI boom. Maybe we need to discard traditional programming languages and start programming in abstract graph-like dataflow representations where one can easily tell if a thing is done right or wrong despite its language or implementation details.

Maybe our future is like the one depicted in Severance - we look at computer screens with wiggly numbers and whatever “feels right” is the right thing to do. We can harvest these effortless low latency “feelings” that nature gives us to make AI do more powerful work.

联系我们 contact @ memedata.com