在循环中运行Claude代码,以模拟人类发展实践。
Running Claude Code in a loop to mirror human development practices

原始链接: https://anandchowdhary.com/blog/2025/running-claude-code-in-a-loop

## 持续Claude:AI驱动的代码改进 面对需要大量测试覆盖的大型代码库,开发者Anand Chowdhary创建了**持续Claude**,一个CLI工具,旨在利用Claude Code在一个持续、迭代的循环中工作。与孤立运行的典型AI编码助手不同,持续Claude模仿CI/CD实践,重复运行Claude Code,并通过共享的markdown文件作为“记忆”来维护跨迭代的上下文。 该工具自动化了一个工作流程:创建分支、提交更改、打开拉取请求、监控CI,以及根据成功与否进行合并或丢弃。这使得Claude能够逐步处理大型任务,从失败中学习并建立在之前的进展之上——有效地充当了“升级版的Dependabot”,用于重构、依赖更新等。 其成功的关键在于提示Claude专注于*有意义的进展*,而不是完整的解决方案,并留下清晰的笔记供下一次迭代使用。这种方法受到“概率辐射”等概念的启发,优先考虑整体方向而不是单次运行的完美性,证明了其具有卓越的容错性和效率,尤其是在token成本降低的情况下。持续Claude可以无缝集成到现有的GitHub工作流程中,利用代码审查和CI检查,而无需额外的设置。

一个 Hacker News 的讨论围绕着使用 Claude (一种 AI 模型) 自动为大型代码库生成单元测试。最初的发帖者探索这个方法是因为合同要求有大量的测试覆盖率。 然而,评论者很快指出了局限性:Claude 经常生成“勉强过关的废话”测试,尤其是在处理复杂、真实的系统时。有效的测试生成需要一个高度参与、迭代的过程——本质上是引导 Claude 定义测试需求,单独编写测试,并通过故意破坏代码并相应地调整测试来*证明*它们有效。 这个过程耗时且昂贵,因为需要消耗 token。许多人发现自己编写测试更有效率,因为 Claude 的行为就像一个不断重置,有时还会破坏东西的实习生。相关的 GitHub 项目“claude-loop”和 Anthropic 自己的“Ralph Wiggum”技术(持续迭代)也被分享出来。
相关文章

原文

This all started because I was contractually obligated to write unit tests for a codebase with hundreds of thousands of lines of code and go from 0% to 80%+ coverage in the next few weeks - seems like something Claude should do. So I built Continuous Claude, a CLI tool to run Claude Code in a loop that maintains a persistent context across multiple iterations.

Current AI coding tools tend to halt after completing a task once they think the job is done and they don’t really have an opportunity for self-criticism or further improvement. And this one-shot pattern then makes it difficult to tackle larger projects. So in contrast to running Claude Code “as is” (which provides help in isolated bursts), what you want is to run Claude code for a long period of time without exhausting the context window.

Turns out, it’s as simple as just running Claude Code in a continuous loop - but drawing inspiration from CI/CD practices and persistent agents - you can take it a step further by running it on a schedule or through triggers and connecting it to your GitHub pull requests workflow. And by persisting relevant context and results from one iteration to the next, this process ensures that knowledge gained in earlier steps is not lost, which is currently not possible in stateless AI queries and something you have to slap on top by setting up markdown files to store progress and context engineer accordingly.

While + git + persistence

The first version of this idea was a simple while loop:

while true; do
  claude --dangerously-skip-permissions "Increase test coverage [...] write notes for the next developer in TASKS.md, [etc.]"
  sleep 1
done

to which my friend Namanyay of Giga AI said “genius and hilarious”. I spent all of Saturday building the rest of the tooling. Now, the Bash script acts as the conductor, repeatedly invoking Claude Code with the appropriate prompts and handling the surrounding tooling. For each iteration, the script:

  1. Creates a new branch and runs Claude Code to generate a commit
  2. Pushes changes and creates a pull request using GitHub’s CLI
  3. Monitors CI checks and reviews via gh pr checks
  4. Merges on success or discards on failure
  5. Pulls the updated main branch, cleans up, and repeats

When an iteration fails, it closes the PR and discards the work. This is wasteful, but with knowledge of test failures, the next attempt can try something different. Because it piggybacks on GitHub’s existing workflows, you get code review and preview environments without additional work - if your repo requires code owner approval or specific CI checks, it will respect those constraints.

Context continuity

A shared markdown file serves as external memory where Claude records what it has done and what should be done next. Without specific prompting instructions, it would create verbose logs that harm more than help - the intent is to keep notes as a clean handoff package between runs. So the key instruction to the model is: “This is part of a continuous development loop… you don’t need to complete the entire goal in one iteration, just make meaningful progress on one thing, then leave clear notes for the next iteration… think of it as a relay race where you’re passing the baton.”

Here’s an actual production example: the previous iteration ended with “Note: tried adding tests to X but failed on edge case, need to handle null input in function Y” and the very next Claude invocation saw that and prioritized addressing it. A single small file reduces context drift, where it might forget earlier reasoning and go in circles.

What’s fascinating is how the markdown file enables self-improvement. A simple “increase coverage” from the user becomes “run coverage, find files with low coverage, do one at a time” as the system teaches itself through iteration and keeps track of its own progress.

Continuous AI

My friends at GitHub Next have been exploring this idea in their project Continuous AI and I shared Continuous Claude with them.

One compelling idea from the team was running specialized agents simultaneously - one for development, another for tests, a third for refactoring. While this could divide and conquer complex tasks more efficiently, it possibly introduces coordination challenges. I’m trying a similar approach for adding tests in different parts of a monorepository at the same time.

The agentics project combines an explicit research phase with pre-build steps to ensure the software is restored before agentic work begins. “The fault-tolerance of Agent in a Loop is really important. If things go wrong it just hits the resource limits and tries again. Or the user just throws the generated PR away if it’s not helpful. It’s so much better than having a frustrated user trying to guide an agent that’s gone down a wrong path,” said GitHub Next Principal Researcher Don Syme.

It reminded me of a concept in economics/mathematics called “radiation of probabilities” (I know, pretty far afield, but bear with me) and here, each agent run is like a random particle - not analyzed individually, but the general direction emerges from the distribution. Each run can even be thought of as idempotent: if GitHub Actions kills the process after six hours, you only lose some dirty files that the next agent will pick up anyway. All you care about is that it’s moving in the right direction in general, for example increasing test coverage, rather than what an individual agent does. This wasteful-but-effective approach becomes viable as token costs approach zero, similar to Cursor’s multiple agents.

Dependabot on steroids

Tools like Dependabot handle dependency updates, but Continuous Claude can also fix post-update breaking changes using release notes. You could run a GitHub Actions workflow every morning that checks for updates and continuously fixes issues until all tests pass.

Large refactoring tasks become manageable: breaking a monolith into modules, modernizing callbacks to async/await, or updating to new style guidelines. It could perform a series of 20 pull requests over a weekend, each doing part of the refactor with full CI validation. There’s a whole class of tasks that are too mundane for humans but still require attention to avoid breaking the build.

The model mirrors human development practices. Claude Code handles the grunt work, but humans remain in the loop through familiar mechanisms like PR reviews. Download the CLI from GitHub to get started: AnandChowdhary/continuous-claude

联系我们 contact @ memedata.com