卡尺：调整你的 CI 运行器规模

卡尺：调整你的 CI 运行器规模
Caliper: Right-size your CI runners

原始链接: https://www.attune.inc/blog/caliper

## Caliper：通过基准测试优化 CI Runner 成本许多团队在 CI runner 上超支，却不知道是否获得了最佳性能。Attune 开发了 **Caliper**，一个 CLI 工具，用于在各种 CPU/RAM 配置上对构建命令进行基准测试，消除猜测。 Caliper 使用 Docker 容器模拟不同的 runner 大小，多次运行构建，并提供详细的统计信息，如平均构建时间、中位数和成功率。其关键功能是 **矩阵模式**，自动测试指定的 CPU 和 RAM 值的每种组合。对 InfluxDB Rust 构建的基准测试揭示了关键见解：**CPU 扩展收益递减。** 从 2 到 4/8 核的提升显著，但超过 16 核的提升微乎其微。**然而，RAM 的影响可以忽略不计** – 8GB 对于此构建来说已经足够。结果表明，4-8 个 CPU 是一个具有成本效益的速度“最佳点”，并强调构建性能因语言和项目而异。Caliper 赋能团队 **基准测试 *他们自己的* 构建**，以识别最佳 runner 配置并避免不必要的成本。 Caliper 在 GitHub 上可用，并可以通过一个简单的脚本安装。

## Caliper CI Runner 讨论总结一篇 Hacker News 的讨论围绕 **Caliper** 工具展开，该工具用于合理配置 CI 运行器。一个关键点是，**更快的单核 CPU 性能通常比更高的核心数量对 CI/CD 任务更有优势**。一位评论员建议桌面 CPU，如 Ryzen 9950X，可能由于其更强的每核速度而优于企业级 Epyc 处理器，并链接到一篇博客文章，详细介绍了更快的自托管 CI/CD 设置。对话还涉及 **动态资源分配**。理想情况下，构建系统应根据实际核心需求智能地编排构建，因为有些构建比其他构建更消耗资源。**Nixbuild.net** 被提及为一个跟踪构建资源使用情况并相应调整分配的服务，甚至在需要时使用更多内存重新启动构建。核心挑战在于准确理解构建依赖图 (DAG) 以优化资源分配。

原文

January 16, 2026

The problem: CI runners are a black box

How do you know if you're overpaying for CI runners? Is it actually more expensive to run longer on a smaller runner than run shorter on a larger one? You pick a runner size sort of randomly, builds run, and you pay the bill. But is a 32-core runner actually faster than a 16-core one for your builds? Does more RAM help? Without data, you have to just make your best guess.

We built Caliper to answer these questions with actual measurements.

What Caliper does

Caliper is a CLI tool that benchmarks your build commands across different CPU/RAM configurations. It uses Docker containers with resource limits to simulate different runner sizes, runs multiple iterations with a warm-up run, and calculates build time statistics: mean, median, standard deviation, P90, P95, and success rate.

The key feature is matrix mode: give Caliper a list of CPU and RAM values, and it will test every combination automatically and provide stats.

Real results: Benchmarking InfluxDB

We benchmarked the InfluxDB Rust build (cargo clean && cargo build) across 25 configurations on a Hetzner AX162-R dedicated server, 10 runs per configuration:

CPUs	RAM	Mean	Median	Std Dev	Min	Max	Success
2	8 GB	6m2s	6m2s	152ms	6m1s	6m2s	100%
2	16 GB	6m0s	6m1s	142ms	6m0s	6m1s	100%
2	32 GB	6m1s	6m1s	545ms	5m59s	6m1s	100%
2	64 GB	6m0s	6m0s	184ms	6m0s	6m0s	100%
2	128 GB	6m1s	6m2s	637ms	6m0s	6m2s	100%
4	8 GB	3m30s	3m30s	601ms	3m29s	3m31s	100%
4	16 GB	3m28s	3m28s	684ms	3m27s	3m29s	100%
4	32 GB	3m29s	3m29s	572ms	3m28s	3m30s	100%
4	64 GB	3m29s	3m30s	966ms	3m28s	3m30s	100%
4	128 GB	3m29s	3m29s	861ms	3m28s	3m30s	100%
8	8 GB	2m41s	2m41s	1.2s	2m38s	2m43s	100%
8	16 GB	2m39s	2m40s	2.0s	2m36s	2m41s	100%
8	32 GB	2m40s	2m40s	1.4s	2m37s	2m42s	100%
8	64 GB	2m39s	2m41s	3.5s	2m33s	2m42s	100%
8	128 GB	2m41s	2m41s	2.2s	2m34s	2m42s	100%
16	8 GB	2m14s	2m14s	829ms	2m13s	2m15s	100%
16	16 GB	2m13s	2m12s	901ms	2m11s	2m15s	100%
16	32 GB	2m12s	2m12s	499ms	2m11s	2m13s	100%
16	64 GB	2m13s	2m14s	761ms	2m12s	2m15s	100%
16	128 GB	2m13s	2m13s	800ms	2m12s	2m14s	100%
32	8 GB	2m12s	2m12s	831ms	2m11s	2m13s	100%
32	16 GB	2m11s	2m11s	1.0s	2m9s	2m12s	100%
32	32 GB	2m9s	2m11s	2.6s	2m6s	2m13s	100%
32	64 GB	2m13s	2m12s	638ms	2m12s	2m14s	100%
32	128 GB	2m11s	2m12s	1.2s	2m8s	2m13s	100%

CPUs scale with diminishing returns

Build Time by CPU Count (8GB RAM)

Going from 2 to 4 CPUs cuts build time nearly in half (6m to 3.5m). 4 to 8 CPUs gives another ~25% improvement. 8 to 16 gives ~17%. Beyond 16 CPUs, there's almost no improvement.

The sweet spot is 4-8 CPUs. A 4-core runner costs 2x more than a 2-core but runs ~1.7x faster, making it roughly cost-neutral with much faster feedback. If you really care about speed, go to 16. Beyond that, you're burning money for no benefit.

RAM doesn't matter above 8GB

Build Time by RAM (4 CPUs)

At 4 CPUs, build time was 3m 30s with 8GB and 3m 29s with 128GB. The difference is noise. We saw the same pattern across all CPU configurations: RAM simply doesn't affect this Rust build.

Save your money: 8GB is enough.

Your builds will be different

This is a Rust build. JavaScript bundlers, Python test suites, Go compilers, and Java builds all behave differently. Some are memory-bound, some are I/O-bound, some parallelize better than others. The only way to know what's optimal for your builds is to benchmark them yourself.

Try it yourself

Install Caliper:

curl -sSL https://raw.githubusercontent.com/attunehq/caliper/main/install.sh | sh

Run a matrix benchmark (adjust image, command, and configs as needed):

caliper matrix all \
  --image ubuntu-2404-go-rust \
  --repo https://github.com/org/repo \
  --runs 10 \
  --command "cargo clean && cargo build" \
  --cpus "2,4,8,16" \
  --rams "8,16,32,64"

Full documentation and source code are available on GitHub.

About Attune

Attune is an applied AI company building the future of software engineering tools. We love the craft of making software, and we think AI can be a useful tool for serious engineers. You can see more of the things we are working on here.