GitHub Actions 中的磁盘 I/O 瓶颈

GitHub Actions 中的磁盘 I/O 瓶颈
Disk I/O bottlenecks in GitHub Actions

原始链接: https://depot.dev/blog/uncovering-disk-io-bottlenecks-github-actions-ci

磁盘I/O可能是CI流水线性能中一个隐藏的瓶颈。像`iostat`（用于监控）和`fio`（用于基准测试）这样的工具可以帮助识别这些问题。监控依赖项安装过程，特别是缓存恢复和解压过程，至关重要。缓存有所帮助，但是解压包含许多小文件的庞大依赖项树可能会使磁盘I/O饱和，尤其是写操作。 GitHub runner通常具有磁盘带宽限制（示例中约为210 MB/s），即使下载速度很快，也会限制写入速度。`fio`可以确认你是否达到了这些限制。虽然使用缓存和并行化来优化你的工作流程很有帮助，但了解runner的限制至关重要。使用矩阵策略和`fio`来比较不同runner的磁盘性能。像Depot Ultra Runner这样的超高速解决方案可以通过RAM磁盘缓存和更强大的CPU来缓解这些瓶颈。通过识别瓶颈，你可以战略性地选择runner并优化缓存以提高CI性能。

Hacker News 的讨论集中在 GitHub Actions 中磁盘 I/O 瓶颈导致构建速度缓慢的问题。许多用户提出了加快 `apt` 安装和整体性能的解决方案。 `ValdikSS` 建议使用 `eatmydata` 在安装软件包期间禁用 `fsync()` 调用。`nijave` 建议在操作系统级别禁用 `fsync`，认为在临时 CI 节点上出现数据损坏是可以接受的。`wtallis` 建议完全从 `tmpfs`（内存文件系统）运行 CI 容器。 `jacobwg` 提到了使用内存写入缓存和 `noatime` 等标志进行实验。`candiddevmike` 推广了 EtchaOS，这是一个针对 CI 运行器优化的、小型、不可变的内存操作系统。`suryao` 认为最终的解决方案是使用直接连接到计算单元的高吞吐量和 IOPS 的 NVMe 存储。他们推崇 WarpBuild，它使用了这种方法，并将其与具有更高延迟的网络附加磁盘进行了对比。

GitHub Actions 的痛点 2025-03-20

显示 HN：开源 x64 和 Arm GitHub 运行程序 2024-01-31

（评论） 2025-03-20

就业市场恐慌 2024-07-18

(评论) 2025-03-17

原文

Disk I/O bottlenecks are easy to overlook when analyzing CI pipeline performance, but tools like iostat and fio can help shed a light on what might be slowing down your pipelines more than you know.

GitHub offers different hosted-runners with a range of specs, but for this test we are using the default ubuntu-22.04 runner in a private repository, which does give us an additional 2 vCPUs but does not alter the disk performance.

How to monitor disk performance

Getting a baseline benchmark from a tool like fio is useful for comparing the relative disk performance of different runners. However, to investigate if you are hitting disk I/O bottlenecks in your CI pipeline, it is more useful to monitor disk performance during the pipeline execution.

We can use a tool like iostat to monitor the disk while installing dependencies from the cache to see how much we are saturating the disk.

- name: Start IOPS Monitoring
  run: |
    echo "Starting IOPS monitoring"
    # Start iostat in the background, logging IOPS every second to iostat.log
    nohup iostat -dx 1 > iostat.log 2>&1 &
    echo $! > iostat_pid.txt  # Save the iostat process ID to stop it later

- uses: actions/cache@v4
  timeout-minutes: 5
  id: cache-pnpm-store
  with:
    path: ${{ steps.get-store-path.outputs.STORE_PATH }}
    key: pnpm-store-${{ hashFiles('pnpm-lock.yaml') }}
    restore-keys: |
      pnpm-store-
      pnpm-store-${{ hashFiles('pnpm-lock.yaml') }}

- name: Stop IOPS Monitoring
  run: |
    echo "Stopping IOPS monitoring"
    kill $(cat iostat_pid.txt)

- name: Save IOPS Data
  uses: actions/upload-artifact@v4
  with:
    name: iops-log
    path: iostat.log

Monitoring disk during untar of Next.js dependencies

In the above test, we used iostat to monitor disk performance while the cache action downloaded and untarred the dependencies for vercel/next.js:

Received 96468992 of 343934082 (28.0%), 91.1 MBs/sec
Received 281018368 of 343934082 (81.7%), 133.1 MBs/sec
Cache Size: ~328 MB (343934082 B)
/usr/bin/tar -xf /home/<path>/cache.tzst -P -C /home/<path>/gha-disk-benchmark --use-compress-program unzstd
Received 343934082 of 343934082 (100.0%), 108.8 MBs/sec
Cache restored successfully

The full step took 12s to complete, and we can estimate the download took around 3s, leaving 9s for the untar operation.

The compressed tarball is only about 328MB, but after extraction, the total amount of data written to the disk is about 1.6GB. That smaller size got our cache across the network plenty fast, and most CPUs can handle decompression fast enough, meaning higher compression is often favorable. Once download and decompression are no longer the bottleneck, that leaves writing to disk.

Reading from a tarball is a fairly efficient process as it's mostly sequential reads, however, we then need to write each file to disk. This is where we can hit disk I/O bottlenecks, especially with a large number of small files.

It’s important to note that this is just a single run, not an average. Running multiple tests over time will give you a much clearer picture of the overall performance. Variance between runs can be quite high, so an individual bad run doesn’t necessarily indicate a problem.

What this run suggests is a possible throughput bottleneck. We’re seeing spikes in the maximum total throughput, with most hovering around ~220MB/s. This is likely the maximum throughput we are able to achieve to this disk, we'll verify this next. We should continue to monitor this and compare it to other runners to see if we can find an ideal runner for our workflow. We'll use fio to double-check if we are hitting the disk's maximum throughput.

An interesting aside before we move on, we can see from this side-by-side how relatively low read operations to writes there are. Since we’re reading from a tarball, most reads are sequential, which tends to be more efficient. That read data is likely going into a buffer before being written to the disk in a more random pattern as it creates a copy of each file. This is why we see a higher write IOPS than read IOPS.

Maximum disk throughput

One of the first optimizations developers usually make to their CI pipelines is caching dependencies. Even though the cache still gets uploaded and downloaded with each run, it speeds things up by packaging all your dependencies into one compressed file. This skips the hassle of resolving dependencies, avoids multiple potentially slow downloads, and cuts down on network delays.

But as we saw above, network speed isn't usually our bottleneck when downloading the cache.

Test Type	Block Size	Bandwidth
Read Throughput	1024KiB	~209MB/s
Write Throughput	1024KiB	~209MB/s

Using fio to test our throughput, notice that both "read" and "write" throughput are both capped at the same value. This is a fairly telling sign that the limitation here is not actually the disk physically, but rather a bandwidth limit imposed by GitHub. This is a standard practice to divide up resources among multiple users who may be accessing the same physical disk from their virtual machines. It isn't always documented, but most providers will have higher bandwidth limits on higher tier runners.

What we measured here aligns fairly closely with the 220MB/s we saw in the untar test, giving us another hint that we are likely being slowed down during our dependency installation, not by the network or CPU, but by the disk.

Regardless of how fast our download speed is, we won't be able to write to disk any faster than our max throughput to the disk.

Uncompressed Cache Size

Estimated time to write to disk: Select a cache payload and throughput speed

Realistically, your disk performance will vary greatly depending on your specific cache size, the number of files, and just general build-to-build variance. That's why it's a good idea to monitor your CI runners for a consistent baseline, and we'll talk about testing your workflow on multiple runners for comparison.

Maximum IOPS (Input/Output Operations Per Second)

After downloading the cache tarball, it needs to be extracted. Depending on the compression level it could be a CPU-intensive operation but this isn't usually a problem. When untar-ing the dependencies, we are performing a lot of small read and write operations, which is where we can hit disk I/O bottlenecks.

Test Type	Block Size	IOPS
Read IOPS	4096B	~51K
Write IOPS	4096B	~57K
Random Read IOPS	4096B	~9370
Random Write IOPS	4096B	~3290

IOPS is a measure of how many read/write operations can be performed in a second. When we have a lot of small files, like especially with a node_modules directory, it is possible to saturate the IOPS limit of the disk (or the imposed limit) and become a different kind of IO bottleneck.

Similarly to how we can't write to the disk any faster than the bandwidth limit, there is a limit to how many IOPS we can perform on the disk.

Running benchmarks on different runners

If you are seeing bottlenecks in your CI pipeline, of any kind, we want to try to optimize for those issues with strategies like caching and parallelizing where possible. But we also need to know if we are hitting the limits of the runner we are using. It's easy enough to add a matrix strategy to your workflow to test on multiple runners for a quick comparison of speed of the same steps on different hardware.

jobs:
  build:
    runs-on: ${{ matrix.runner }}
    strategy:
      matrix:
        runner: [ubuntu-22.04, depot-ubuntu-22.04]

To get a more detailed look at the specific disk performance of each runner, you can use the fio benchmarking tool we mentioned earlier. This will give you a better idea of the disk performance of each runner, and a reference point for checking for bottlenecks in your CI pipeline.

- name: Random Read Throughput Test
  run: |
    fio --ioengine=sync --bs=4k --rw=randread --name=random_read_throughput \
    --direct=1 --filename=$HOME/fio_test/file --time_based --runtime=10s \
    --size=250m --output=random_read_throughput_result-${{ matrix.runner }}.txt

- name: Clean up Test Directory
  run: rm -rf $HOME/fio_test/*

- name: Random Write Throughput Test
  run: |
    fio --ioengine=sync --bs=4k --rw=randwrite --name=random_write_throughput \
    --direct=1 --filename=$HOME/fio_test/file --time_based --runtime=10s \
    --size=250m --output=random_write_throughput_result-${{ matrix.runner }}.txt

- name: Clean up Test Directory
  run: rm -rf $HOME/fio_test/*

- name: Random Read IOPS Test
  run: |
    fio --name=random_read_iops --directory=$HOME/fio_test --size=5G \
    --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
    --verify=0 --bs=4K --iodepth=256 --rw=randread --group_reporting=1 \
    --iodepth_batch_submit=256 --iodepth_batch_complete_max=256 \
    --output=random_read_iops_result-${{ matrix.runner }}.txt

- name: Clean up Test Directory
  run: rm -rf $HOME/fio_test/*

- name: Random Write IOPS Test
  run: |
    fio --name=random_write_iops --directory=$HOME/fio_test --size=5G \
    --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
    --verify=0 --bs=4K --iodepth=256 --rw=randwrite --group_reporting=1 \
    --iodepth_batch_submit=256 --iodepth_batch_complete_max=256 \
    --output=random_write_iops_result-${{ matrix.runner }}.txt

Ultra-fast disk I/O with Depot Ultra Runner

Depot is launching a new runner type with ultra-fast disk I/O, the Depot Ultra Runner. The Ultra Runner utilizes a large RAM disk cache and higher-powered CPUs to maximize performance in both high IOPS and high throughput scenarios.

Want to be notified when the Depot Ultra Runner is available? Subscribe to our changelog for all major updates.

Try comparing your current workflow on a Depot runner. Sign up for our 7-day free trial and compare your CI pipeline performance on Depot Runners with a matrix job.