卡尺:调整你的 CI 运行器规模
Caliper: Right-size your CI runners

原始链接: https://www.attune.inc/blog/caliper

## Caliper:通过基准测试优化 CI Runner 成本 许多团队在 CI runner 上超支,却不知道是否获得了最佳性能。Attune 开发了 **Caliper**,一个 CLI 工具,用于在各种 CPU/RAM 配置上对构建命令进行基准测试,消除猜测。 Caliper 使用 Docker 容器模拟不同的 runner 大小,多次运行构建,并提供详细的统计信息,如平均构建时间、中位数和成功率。其关键功能是 **矩阵模式**,自动测试指定的 CPU 和 RAM 值的每种组合。 对 InfluxDB Rust 构建的基准测试揭示了关键见解:**CPU 扩展收益递减。** 从 2 到 4/8 核的提升显著,但超过 16 核的提升微乎其微。**然而,RAM 的影响可以忽略不计** – 8GB 对于此构建来说已经足够。 结果表明,4-8 个 CPU 是一个具有成本效益的速度“最佳点”,并强调构建性能因语言和项目而异。Caliper 赋能团队 **基准测试 *他们自己的* 构建**,以识别最佳 runner 配置并避免不必要的成本。 Caliper 在 GitHub 上可用,并可以通过一个简单的脚本安装。

## Caliper CI Runner 讨论总结 一篇 Hacker News 的讨论围绕 **Caliper** 工具展开,该工具用于合理配置 CI 运行器。一个关键点是,**更快的单核 CPU 性能通常比更高的核心数量对 CI/CD 任务更有优势**。一位评论员建议桌面 CPU,如 Ryzen 9950X,可能由于其更强的每核速度而优于企业级 Epyc 处理器,并链接到一篇博客文章,详细介绍了更快的自托管 CI/CD 设置。 对话还涉及 **动态资源分配**。理想情况下,构建系统应根据实际核心需求智能地编排构建,因为有些构建比其他构建更消耗资源。**Nixbuild.net** 被提及为一个跟踪构建资源使用情况并相应调整分配的服务,甚至在需要时使用更多内存重新启动构建。核心挑战在于准确理解构建依赖图 (DAG) 以优化资源分配。
相关文章

原文

The problem: CI runners are a black box

How do you know if you're overpaying for CI runners? Is it actually more expensive to run longer on a smaller runner than run shorter on a larger one? You pick a runner size sort of randomly, builds run, and you pay the bill. But is a 32-core runner actually faster than a 16-core one for your builds? Does more RAM help? Without data, you have to just make your best guess.

We built Caliper to answer these questions with actual measurements.

What Caliper does

Caliper is a CLI tool that benchmarks your build commands across different CPU/RAM configurations. It uses Docker containers with resource limits to simulate different runner sizes, runs multiple iterations with a warm-up run, and calculates build time statistics: mean, median, standard deviation, P90, P95, and success rate.

The key feature is matrix mode: give Caliper a list of CPU and RAM values, and it will test every combination automatically and provide stats.

Real results: Benchmarking InfluxDB

We benchmarked the InfluxDB Rust build (cargo clean && cargo build) across 25 configurations on a Hetzner AX162-R dedicated server, 10 runs per configuration:

CPUsRAMMeanMedianStd DevMinMaxSuccess
28 GB6m2s6m2s152ms6m1s6m2s100%
216 GB6m0s6m1s142ms6m0s6m1s100%
232 GB6m1s6m1s545ms5m59s6m1s100%
264 GB6m0s6m0s184ms6m0s6m0s100%
2128 GB6m1s6m2s637ms6m0s6m2s100%
48 GB3m30s3m30s601ms3m29s3m31s100%
416 GB3m28s3m28s684ms3m27s3m29s100%
432 GB3m29s3m29s572ms3m28s3m30s100%
464 GB3m29s3m30s966ms3m28s3m30s100%
4128 GB3m29s3m29s861ms3m28s3m30s100%
88 GB2m41s2m41s1.2s2m38s2m43s100%
816 GB2m39s2m40s2.0s2m36s2m41s100%
832 GB2m40s2m40s1.4s2m37s2m42s100%
864 GB2m39s2m41s3.5s2m33s2m42s100%
8128 GB2m41s2m41s2.2s2m34s2m42s100%
168 GB2m14s2m14s829ms2m13s2m15s100%
1616 GB2m13s2m12s901ms2m11s2m15s100%
1632 GB2m12s2m12s499ms2m11s2m13s100%
1664 GB2m13s2m14s761ms2m12s2m15s100%
16128 GB2m13s2m13s800ms2m12s2m14s100%
328 GB2m12s2m12s831ms2m11s2m13s100%
3216 GB2m11s2m11s1.0s2m9s2m12s100%
3232 GB2m9s2m11s2.6s2m6s2m13s100%
3264 GB2m13s2m12s638ms2m12s2m14s100%
32128 GB2m11s2m12s1.2s2m8s2m13s100%

CPUs scale with diminishing returns

Build Time by CPU Count (8GB RAM)

Going from 2 to 4 CPUs cuts build time nearly in half (6m to 3.5m). 4 to 8 CPUs gives another ~25% improvement. 8 to 16 gives ~17%. Beyond 16 CPUs, there's almost no improvement.

The sweet spot is 4-8 CPUs. A 4-core runner costs 2x more than a 2-core but runs ~1.7x faster, making it roughly cost-neutral with much faster feedback. If you really care about speed, go to 16. Beyond that, you're burning money for no benefit.

RAM doesn't matter above 8GB

Build Time by RAM (4 CPUs)

At 4 CPUs, build time was 3m 30s with 8GB and 3m 29s with 128GB. The difference is noise. We saw the same pattern across all CPU configurations: RAM simply doesn't affect this Rust build.

Save your money: 8GB is enough.

Your builds will be different

This is a Rust build. JavaScript bundlers, Python test suites, Go compilers, and Java builds all behave differently. Some are memory-bound, some are I/O-bound, some parallelize better than others. The only way to know what's optimal for your builds is to benchmark them yourself.

Try it yourself

Install Caliper:

curl -sSL https://raw.githubusercontent.com/attunehq/caliper/main/install.sh | sh

Run a matrix benchmark (adjust image, command, and configs as needed):

caliper matrix all \ --image ubuntu-2404-go-rust \ --repo https://github.com/org/repo \ --runs 10 \ --command "cargo clean && cargo build" \ --cpus "2,4,8,16" \ --rams "8,16,32,64"

Full documentation and source code are available on GitHub.

About Attune

Attune is an applied AI company building the future of software engineering tools. We love the craft of making software, and we think AI can be a useful tool for serious engineers. You can see more of the things we are working on here.

联系我们 contact @ memedata.com