/架构师:将 Fable 令牌减少 80%,由 Fable 负责编排与审核,Codex 负责构建。
/architect: Reduce Fable tokens by 80%, Fable orchestrates/reviews, Codex builds

原始链接: https://github.com/DanMcInerney/architect-loop

**架构师循环 (Architect-Loop)** 是一个自动化的软件开发框架,它利用双智能体系统来执行复杂的工程和研究任务,无需 API 密钥或额外的 Token 费用。通过利用您现有的 Claude 和 ChatGPT (Codex) 订阅,它避免了智能体工作流程中常见的“简单共享文件”协作陷阱。 **工作原理:** * **架构师 (Claude Fable):** 担任监督者。它将项目拆分为独立的“切片”,定义只读的验收准则,并进行最终评判。它从不编写代码,从而确保了严格的质量控制。 * **构建者/研究员 (GPT-5.5 Codex):** 在独立的 Git 工作树中并行执行工作。构建者必须对规范提出异议以确保准确性,且在物理层面被限制无法篡改准则文件。 该系统具有**“先遣侦察”研究流程**,即在派遣专业研究小组之前,先进行初步搜索以规划主题。这确保了输出内容的循证性,并强制要求提供引用。 **核心优势:** * **安全性:** 构建者无法提交更改;架构师手动验证差异和准则指令。 * **高效性:** 使用独立的工作树来防止依赖冲突。 * **透明度:** 所有状态均存储在仓库中(无“幽灵”内存),且长时间运行的进程包含内置的存活状态检查。

这篇 Hacker News 讨论聚焦于 Dan McInerney 的一个项目,该项目旨在通过双层架构将“Fable”(一个 AI 编程智能体)的 Token 使用量减少 80%:由昂贵的高推理模型负责规划,而由成本较低的模型执行具体实现。 评论者指出,这种“由优质模型规划、由廉价模型执行”的策略是智能体开发中的一个反复出现的趋势。然而,社区对 AI 智能体的现状仍持怀疑态度。用户分享了关于“上下文损耗”、目标偏离以及重复且违背逻辑的行为(例如模型反复运行被明确禁止的命令)所带来的挫败感。虽然一些用户认为模型之间的性能差距依然巨大,但另一些人则认为,整个“智能体化”的工作流程通常比手动编写代码产生更多的额外开销。总体而言,这次讨论凸显了分层智能体架构的理论效率与当前 AI 编程工具往往令人沮丧的实际表现之间持续存在的差距。
相关文章

原文

Claude Fable is the architect — it designs every slice, freezes the acceptance gates, and judges the results. GPT-5.5 Codex is the builder and researcher — it does all the engineering and all the web research, in parallel, unattended, for hours. Two Claude Code skills that run this cross-vendor loop on the flat-rate subscriptions you already have — no API keys, no token bills.

git clone https://github.com/DanMcInerney/architect-loop
cd architect-loop && ./install.sh        # Windows: .\install.ps1
npm i -g @openai/codex@latest            # the builder (Codex CLI >= 0.133)

./install.sh --project installs to the current repo only instead of globally. You need Claude Code on any paid plan and the Codex CLI signed into a ChatGPT plan.

/architect                                      # the build loop
/architect-research <what you're considering>   # the research loop

/architect runs one work block: judge the last run, spec the next slice, dispatch builders. /architect-research is for when you're still deciding what to build — its cited report feeds the build loop's PRD.

/architect flow

One short Fable session per work block — judgment only, it never writes code:

  • Spec + gates first. Fable specs a one-PR slice, splits it into 1–4 lanes with provably disjoint file sets, and commits the acceptance gates to docs/gates/ before any builder starts. Gates are read-only; a builder edit to a gate file fails the slice automatically.
  • Parallel isolated builders. One fresh codex exec (xhigh) per lane, each in its own git worktree. Builders must argue with the spec before building (silent compliance = defect), build only their declared files, and report raw results — they physically can't commit (the sandbox protects .git).
  • Fable judges and integrates. It runs the gate commands itself (builder claims are hearsay), reads the diff against the spec's intent (passing tests ≠ mergeable work), then commits and merges passing lanes. Judgment happens in a fresh session — cross-context review measurably beats same-session review.
  • The repo is the only memory. docs/HANDOFF.md (a short table of contents, pruned every session), docs/gates/, docs/lanes/, git history. Not in the repo = didn't happen.
  • Supervision built in. Liveness checks on dispatched runs, stall triage (diagnose the child process tree, kill the narrowest thing), explicit timeouts on every long command.

/architect-research flow

Scout-first, like the production deep-research systems — no fixed lane taxonomy:

  • A cheap Codex scout maps the topic (~10 searches): canonical terminology, the load-bearing systems and papers, the named people, the topic's natural fault lines. Skipped for comparisons and fact-finds.
  • Fable designs 3–6 topic-specific lanes from the scout's map, drawing per-source-class tactics from a library (academic citation snowballing, dependents-not-stars repo evidence, emerging-vs-hype gating, production pattern mining, expert tracking) — checked for overlap and gaps before dispatch.
  • Parallel Codex researchers run under hard budgets: search caps, ≤5 subjects per lane, saturation stop, strict findings discipline (URL + date
    • quote + confidence tag; NOT FOUND beats inference; no recommendations). Expert opinion runs as a second wave, roster-seeded by the first.
  • Fable verifies and writes. ≥2 independent sources per load-bearing claim, adversarial falsification searches, citations only from URLs actually fetched — then one author writes one decision-oriented report. Gathering parallelizes; synthesis never does.

Each piece is there because evidence put it there (full citations in DESIGN.md):

  • Weak planners hurt more than weak executors — so the strongest model does the design, and builders get exhaustive specs.
  • Manager + worktree-isolated workers is the measured-best topology for shared-artifact software work; naive shared-file coordination collapses throughput.
  • Frozen external gates beat trusting the agent — but agents game visible tests and their passing PRs are frequently unmergeable, so the architect also reads the diff.
  • Memory files rot — so the handoff stays a short map, and detail lives in linked gate/lane files.
  • Every production deep-research system uses planner-designed decomposition, none uses fixed lanes — so research lanes are designed per topic, after a scout pass.

Do I need API keys? No. Claude Code runs on your Claude plan; Codex CLI on your ChatGPT plan.

What does a run cost? Builder/researcher runs draw on your ChatGPT plan's 5-hour and weekly quotas; a multi-hour run is a meaningful fraction of a weekly window. Fable's architect sessions are minutes, not hours.

What if a builder wrecks things? Nothing reaches a branch until the architect's tamper, boundary, and gate checks pass — worktrees are discarded and re-dispatched from the freeze commit.

Can I watch a run? Yes — every dispatch prints the builder block, so you can paste it into an interactive codex session with /goal instead.

Why two skills? Research-grade fan-out costs ~15× chat-level tokens — it should be a deliberate act, not a side-effect of the build loop.

MIT

联系我们 contact @ memedata.com