足够详细的规格就是代码。
A sufficiently detailed spec is code

原始链接: https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code

## 代码生成幻觉:从规范入手 最近关于“代理编程”的炒作声称能够*仅*从规范生成代码,承诺一种转变,工程师将管理代理而不是直接编写代码。然而,这基于两个错误的假设。首先,规范本质上比它们产生的代码更简单——这是一个错误的前提,因为精确的规范往往在细节上*变得*像代码一样。OpenAI 的 Symphony 项目,被吹捧为由规范驱动,就证明了这一点,其中包含伪代码、数据库模式,甚至 AI 的“作弊表”——模糊了规范和实现之间的界限。 其次,认为规范工作自动比编码更有思考是错误的。当前加速交付的压力常常导致仓促、考虑不周的规范——本质上是缺乏连贯性的“AI 生成的垃圾”。 试图从这些有缺陷的规范中自动生成代码是明显失败的。尝试使用 AI 代理构建 Symphony 本身导致了错误,最终,无果而终。这与 YAML 等复杂规范的问题相呼应,即使有详细的文档,完全符合仍然难以捉摸。最终,你无法捷径工程所需的精确度;试图这样做只会转移劳动,并产生不可靠的结果。规范是有价值的,但并非作为节省时间的工具——它们需要仔细的思考,而这在当今的技术环境中正日益受到损害。

一个黑客新闻的讨论围绕着一个观点:足够详细的规范可以有效地*成为*代码。最初的帖子提出了这个观点,引发了对细微之处的探讨。 一位评论员澄清说,正式标准(如C/C++)并非真正的“代码”,因为它们本质上允许实现定义的行为——这是它们灵活性的关键方面。另一位分享了一个实验,构建一种由确定性代码*和*LLM解释的语言,发现由于人类和人工智能对语言和数据处理理解不同,这很困难。 第三位评论员建议将规范视为定义一组预期行为,例如多项式方程,然后可以用各种方式实现。他们以无锁队列为例——规范定义了*需要什么*,而不是*如何构建它*。
相关文章

原文

This post is essentially this comic strip expanded into a full-length post:

For a long time I didn't need a post like the one I'm about to write. If someone brought up the idea of generating code from specifications I'd share the above image with them and that would usually do the trick.

However, agentic coding advocates claim to have found a way to defy gravity and generate code purely from specification documents. Moreover, they've also muddied the waters enough that I believe the above comic strip warrants additional commentary for why their claims are misleading.

In my experience their advocacy is rooted in two common misconceptions:

  • Misconception 1: specification documents are simpler than the corresponding code

    They lean on this misconception when marketing agentic coding to believers who think of agentic coding as the next generation of outsourcing. They dream of engineers being turned into managers who author specification documents which they farm out to a team of agents to do the work, which only works if it's cheaper to specify the work than to do the work.

  • Misconception 2: specification work must be more thoughtful than coding work

    They lean on this misconception when marketing agentic coding to skeptics concerned that agentic coding will produce unmaintainable slop. The argument is that filtering the work through a specification document will improve quality and promote better engineering practices.

I'll break down why I believe those are misconceptions using a concrete example.

Thinly-veiled code

I'll begin from OpenAI's Symphony project, which OpenAI heralds as as an example of how to generate a project from a specification document.

The Symphony project is an agent orchestrator that claims to be generated from a "specification" (SPEC.md), and I say "specification" in quotes because this file is less of a specification and more like pseudocode in markdown form. If you scratch the surface of the document you'll find it contains things like prose dumps of the database schema:

4.1.6 Live Session (Agent Session Metadata)

State tracked while a coding-agent subprocess is running.

Fields:

  • session_id (string, <thread_id>-<turn_id>)
  • thread_id (string)
  • turn_id (string)
  • codex_app_server_pid (string or null)
  • last_codex_event (string/enum or null)
  • last_codex_timestamp (timestamp or null)
  • last_codex_message (summarized payload)
  • codex_input_tokens (integer)
  • codex_output_tokens (integer)
  • codex_total_tokens (integer)
  • last_reported_input_tokens (integer)
  • last_reported_output_tokens (integer)
  • last_reported_total_tokens (integer)
  • turn_count (integer)
    • Number of coding-agent turns started within the current worker lifetime.

… or prose dumps of code:

8.3 Concurrency Control

Global limit:

  • available_slots = max(max_concurrent_agents - running_count, 0)

Per-state limit:

  • max_concurrent_agents_by_state[state] if present (state key normalized)
  • otherwise fallback to global limit

The runtime counts issues by their current tracked state in the running map.

8.4 Retry and Backoff

Retry entry creation:

  • Cancel any existing retry timer for the same issue.
  • Store attempt, identifier, error, due_at_ms, and new timer handle.

Backoff formula:

  • Normal continuation retries after a clean worker exit use a short fixed delay of 1000 ms.
  • Failure-driven retries use delay = min(10000 * 2^(attempt - 1), agent.max_retry_backoff_ms).
  • Power is capped by the configured max retry backoff (default 300000 / 5m).

Retry handling behavior:

  1. Fetch active candidate issues (not all issues).
  2. Find the specific issue by issue_id.
  3. If not found, release claim.
  4. If found and still candidate-eligible:
    • Dispatch if slots are available.
    • Otherwise requeue with error no available orchestrator slots.
  5. If found but no longer active, release claim.

… or sections added explicitly added to babysit the model's code generation, like this:

6.4 Config Fields Summary (Cheat Sheet)

This section is intentionally redundant so a coding agent can implement the config layer quickly.

  • tracker.kind: string, required, currently linear
  • tracker.endpoint: string, default https://api.linear.app/graphql when tracker.kind=linear

… or outright code1:

16.1 Service Startup

function start_service():
  configure_logging()
  start_observability_outputs()
  start_workflow_watch(on_change=reload_and_reapply_workflow)

  state = {
    poll_interval_ms: get_config_poll_interval_ms(),
    max_concurrent_agents: get_config_max_concurrent_agents(),
    running: {},
    claimed: set(),
    retry_attempts: {},
    completed: set(),
    codex_totals: {input_tokens: 0, output_tokens: 0, total_tokens: 0, seconds_running: 0},
    codex_rate_limits: null
  }

  validation = validate_dispatch_config()
  if validation is not ok:
    log_validation_error(validation)
    fail_startup(validation)

  startup_terminal_workspace_cleanup()
  schedule_tick(delay_ms=0)

  event_loop(state)

I feel like it's pretty disingenuous for agentic coding advocates to market this as a substitute for code when the specification document reads like code (or in some cases is literally code).

Don't get me wrong: I'm not saying that specification documents should never include pseudocode or a reference implementation; those are both fairly common in specification work. However, you can't claim that specification documents are a substitute for code when they read like code.

I bring this up because I believe Symphony illustrates the first misconception well:

Misconception 1: specification documents are simpler than the corresponding code

If you try to make a specification document precise enough to reliably generate a working implementation you must necessarily contort the document into code or something strongly resembling code (like highly structured and formal English).

Dijkstra explains why this is inevitable:

We know in the meantime that the choice of an interface is not just a division of (a fixed amount of) labour, because the work involved in co-operating and communicating across the interface has to be added. We know in the meantime —from sobering experience, I may add— that a change of interface can easily increase at both sides of the fence the amount of work to be done (even drastically so). Hence the increased preference for what are now called "narrow interfaces". Therefore, although changing to communication between machine and man conducted in the latter's native tongue would greatly increase the machine's burden, we have to challenge the assumption that this would simplify man's life.

A short look at the history of mathematics shows how justified this challenge is. Greek mathematics got stuck because it remained a verbal, pictorial activity, Moslem "algebra", after a timid attempt at symbolism, died when it returned to the rhetoric style, and the modern civilized world could only emerge —for better or for worse— when Western Europe could free itself from the fetters of medieval scholasticism —a vain attempt at verbal precision!— thanks to the carefully, or at least consciously designed formal symbolisms that we owe to people like Vieta, Descartes, Leibniz, and (later) Boole.

Agentic coders are learning the hard way that you can't escape the "narrow interfaces" (read: code) that engineering labor requires; you can only transmute that labor into something superficially different which still demands the same precision.

Flakiness

Also, generating code from specifications doesn't even reliably work! I actually tried to do what the Symphony README suggested:

Tell your favorite coding agent to build Symphony in a programming language of your choice:

Implement Symphony according to the following spec: https://github.com/openai/symphony/blob/main/SPEC.md

I asked Claude Code to build Symphony in a programming language of my choice (Haskell2, if you couldn't guess from the name of my blog) and it did not work. You can find the result in my Gabriella439/symphony-haskell repository.

Not only were there multiple bugs (which I had to prompt Claude to fix and you can find those fixes in the commit history), but even when things "worked" (meaning: no error messages) the codex agent just spun silently without making any progress on the following sample Linear ticket:

Create a new blank repository

No need to create a GitHub project. Just create a blank git repository

In other words, Symphony's "vain attempt at verbal precision" (to use Dijkstra's words) still fails to reliably generate a working implementation3.

This problem also isn't limited to Symphony: we see this same problem even for well-known specifications like YAML. The YAML specification is extremely detailed, widely used, and includes a conformance test suite and the vast majority of YAML implementations still do not conform fully to the spec.

Symphony could try to fix the flakiness by expanding the specification but it's already pretty long, clocking in at 1/6 the size of the included Elixir implementation! If the specification were to grow any further they would recapitulate Borges's "On Exactitude in Science" short story:

…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.

Slop

Specification work is supposed to be harder than coding. Typically the reason we write specification documents before doing the work is to encourage viewing the project through a contemplative and critical lens, because once coding begins we switch gears and become driven with a bias to action.

So then why do I say that this is a misconception:

Misconception 2: specification work must be more thoughtful than coding work

The problem is that this sort of thoughtfulness is no longer something we can take for granted thanks to the industry push to reduce and devalue labor at tech companies. When you begin from the premise of "I told people specification work should be easier than coding" then you set yourself up to fail. There is no way that you can do the difficult and uncomfortable work that specification writing requires if you optimize for delivery speed. That's how you get something like the Symphony "specification" that looks superficially like a specification document but then falls apart under closer scrutiny.

In fact, the Symphony specification reads as AI-written slop. Section 10.5 is a particularly egregious example of the slop I'm talking about, such as this excerpt:

linear_graphql extension contract:

  • Purpose: execute a raw GraphQL query or mutation against Linear using Symphony's configured tracker auth for the current session.

  • Availability: only meaningful when tracker.kind == "linear" and valid Linear auth is configured.

  • Preferred input shape:

    {
      "query": "single GraphQL query or mutation document",
      "variables": {
        "optional": "graphql variables object"
      }
    }
  • query must be a non-empty string.

  • query must contain exactly one GraphQL operation.

  • variables is optional and, when present, must be a JSON object.

  • Implementations may additionally accept a raw GraphQL query string as shorthand input.

  • Execute one GraphQL operation per tool call.

  • If the provided document contains multiple operations, reject the tool call as invalid input.

  • operationName selection is intentionally out of scope for this extension.

  • Reuse the configured Linear endpoint and auth from the active Symphony workflow/runtime config; do not require the coding agent to read raw tokens from disk.

  • Tool result semantics:

    • transport success + no top-level GraphQL errors -> success=true
    • top-level GraphQL errors present -> success=false, but preserve the GraphQL response body for debugging
    • invalid input, missing auth, or transport failure -> success=false with an error payload
  • Return the GraphQL response or error payload as structured tool output that the model can inspect in-session.

That is a grab bag of "specification-shaped" sentences that reads like an agent's work product: lacking coherence, purpose, or understanding of the bigger picture.

A specification document like this must necessarily be slop, even if it were authored by a human, because they're optimizing for delivery time rather than coherence or clarity. In the current engineering climate we can no longer take for granted that specifications are the product of careful thought and deliberation.

Conclusion

Specifications were never meant to be time-saving devices. If you are optimizing for delivery time then you are likely better off authoring the code directly rather than going through an intermediate specification document.

More generally, the principle of "garbage in, garbage out" applies here. There is no world where you input a document lacking clarity and detail and get a coding agent to reliably fill in that missing clarity and detail. Coding agents are not mind readers and even if they were there isn't much they can do if your own thoughts are confused.

联系我们 contact @ memedata.com