反向工程 GHA 缓存以提升性能
Reverse engineering the GHA cache to improve performance (2024)

原始链接: https://depot.dev/blog/github-actions-cache

## Depot 更快的 GitHub Actions Runners:摘要 Depot 推出了旨在显著加快构建时间(尤其是 Docker 构建)的 GitHub Actions runners,通过大幅提升缓存性能来实现。传统的托管 CI 平台,包括 GitHub Actions,受到网络速度(约 125MB/s)和缓存大小(10GB)的限制,阻碍了高效缓存。 Depot 通过利用 AWS 基础设施(EC2 实例和 S3 存储)以及定制代理来绕过这些限制。该代理拦截缓存请求,将其导向 S3,并利用并行流实现 **超过 1GB/s 的缓存速度 – 是 GitHub 默认速度的 10 倍**。重要的是,Depot 避免了对标准的 GitHub Actions 缓存 action 进行分叉,从而确保了流畅的开发者体验。 他们的 runners 按需配置,利用待机池实现即时启动,并与 Depot 现有的 Docker 构建服务无缝集成,以实现更快的流程。Depot runners 提供无限缓存大小和 30 天的保留期。 目前提供 7 天免费试用,用户可以通过在 GitHub Actions 工作流程中将 `runs-on: ubuntu-22.04` 更改为 `runs-on: depot-ubuntu-22.04` 来切换到 Depot runners。Depot 还提供 ARM runners 和增加磁盘空间选项。

这次Hacker News讨论围绕一篇关于逆向工程GitHub Actions (GHA) 缓存的最新文章,目的是提高构建性能。Depot的创始人指出,他们去年五月的工作为Blacksmith的一篇类似文章提供了灵感。 用户建议明确文章的日期(添加“(2024)”),以确立Depot的先例。一个关键点是,GitHub已经发布了其内部缓存API的v2版本,该版本基于Twirp构建,并且Depot已经适应了新版本——甚至指出它可以处理Actions工件。 对话还强调了Depot在优化构建方面所做的出色工程工作,一位评论员建议Docker、JFrog或GitHub等大型公司应该收购他们。最后,文章还提醒了Y Combinator 2025年秋季申请时间。
相关文章

原文

This article walks you through how to use Depot's API within your own code to set up projects and run your Docker builds as a service on Depot's infrastructure.

We recently announced our new product, Depot-hosted GitHub Actions runners. Our runners bring an extra improvement in cache speed that's no longer limited to our accelerated Docker builds. We're excited to be bringing faster caching to all kinds of GitHub Actions workloads.

As we were building our runners, we learned a lot about the undocumented inner workings of the GitHub Actions cache. In this post, we share what we learned, how we incorporated this knowledge into our new product, Depot GitHub Actions runners, and how you can use it to make your workflows more efficient.

Use Depot GitHub Actions runners to speed up your builds. Fast machines hosted in AWS, with up to 10x faster caching performance. Try free for 7 days. To get started, create a Depot account and then visit our docs.

The GitHub Actions cache challenge

In order for builds in general, and especially Docker builds, to be efficient within CI, they need to rely heavily on caching, so that the compute- and time-intensive work of building code and dependencies gets reused as much as possible between builds.

The problem with caching in hosted CI platforms is that all runners are ephemeral, so the cache needs to be stored remotely. Then, when the runner needs to use the cache, it has to be transferred over the network in every build before it can be used. Networks are often slow and flaky, which can negate the speed improvements from caching when you have to use them to save and load the cache.

GitHub's own hosted runners suffer from this limitation due to capped network speed. GitHub's runners have access to around 1 Gbps of network throughput, equivalent to 125 MB/s, which greatly limits how quickly cache can be saved or restored. GitHub also limits their cache to 10 GB per repository, which can quickly become exhausted as the cache becomes larger. On top of this, the cache API itself can be flaky.

There are alternatives that seek to address these limitations, but they don't solve them completely, or they create developer experience hurdles with their solutions. For example, most other hosted GitHub Actions runner providers have created their own separate caching implementation and forked all GitHub Actions cache actions into their own namespace, in order to point the cache action to their own cache implementation. So, to take advantage of the faster caching that these providers offer, you would need to change all your workflows away from actions/cache@v3 to something like hosting-provider/cache@v1. While the config change might seem small, this involves duplicating work for everyone and maintaining multiple versions of what should effectively be the same GitHub Actions cache actions.

Another issue that alternative GitHub Actions runner providers can introduce is latency and reduced bandwidth due to having the compute located in European data centers such as Hetzner. While these compute providers are inexpensive, many internet services and infrastructure providers (including GitHub) are hosted in the US, and the added latency and lower bandwidth when moving data between Europe and the US can make some workflows much slower. We believe cache actions should “just work” out of the box and be fast, even on different runners than those hosted by GitHub, and developers shouldn't have to think about which action they should use based on where their build is going to be run.

How Depot went about solving the GitHub Actions cache challenge

To address the caching challenge when building hosted GitHub Actions runners, we asked: how can we provide GitHub Actions cache without forking the default action and maintaining the entire ecosystem, while also making it significantly faster? The GitHub Actions runner code is freely available under the MIT license on GitHub, but the repo doesn't offer much guidance on how to customize it. So we had to get creative.

Infrastructure-wise, we went all in on AWS, using EC2 instances for the runners and S3 for the cache storage. We believe this is the right combination to get the best performance and access the best compute options, as well as to maximize bandwidth and minimize latency for anything our customers might need to do in AWS as part of their CI builds. AWS is also the stack we have the most experience with, as we use AWS extensively for the Depot Docker Builds product.

Software-wise, we decided to try to “point” the GitHub Actions runner to our own cache instead of the default GitHub Actions cache. The method for doing this was not documented in the runner repo, so we had to do some observation and a bit of reverse engineering.

As the first step, we intercepted the cache interactions on a GitHub Actions runner and realized that the GitHub Cache API responded with blob URLs hosted on Azure Blob storage. This makes sense, given that GitHub Actions runners are hosted in Azure.

Second, we determined that we could point the runner to our cache storage by running a small Go proxy on every runner and routing all GitHub Cache API calls through it, and it could respond with blob URLs hosted on S3 instead. Thankfully, there is a variable that can be changed in the GitHub Actions runner code to facilitate this. Because of the very high throughput possible from EC2 to S3, we had confidence that using EC2 for the runners and S3 for cache storage would already yield a cache speed improvement.

Third, after setting up the proxy for the cache, we saw that the uploads to and downloads from the cache were happening via only two parallel streams. While this could have been optimal on Azure, on EC2 and S3 we were confident we could use more parallelism to get higher performance. So we experimented with a higher number of parallel streams until we found the best-performing settings for both upload (4) and download (8) for our usage patterns.

Effectively, we're running an on-machine implementation of GitHub's Cache API that stores all cache content to S3, is optimized to save and restore that content using many parallel streams, and is fully integrated with Depot's authorization system. With this architecture, we're able to save and restore cache at the rate of over 1 GB/s, a 10x improvement over GitHub's default cache!

Depot's GitHub Actions Runners architecture

While a faster cache helps to improve the overall build performance, we also needed to figure out how to make runners available on demand with the right configuration without spending a crazy amount of money running instances 24/7.

An architecture diagram showing the Depot-hosted GitHub Actions Runner instances, including the storage used for GitHub Actions cache, located close to Depot's Docker Build infrastructure, increasing the performance of builds for customers that use both products.

At the infrastructure level, the runners are hosted in a way that's very similar to our Docker builders. Specifically, we provision a new ephemeral EC2 instance for every build, and never reuse instances between builds. We use a standby pool of machines to make sure builds start instantly.

The Runners are located close to the Docker container builders, so if you use both products you can benefit from even more speedup if you are building a container than if you're pulling it back into a CI workflow for testing, for example.

More detail on how we're provisioning VMs in EC2 fast and cost-effectively is coming up in a separate post.

Results of our GitHub Actions cache optimization

With this infrastructure and cache logic, we are able to achieve up to 10x higher cache performance compared to GitHub Actions default runners.

AspectGitHub Actions defaultDepot
Cache download throughput100-150 MB/s1 GB/s
Cache upload throughput100-150 MB/s1 GB/s
Cache size limit10 GBUnlimited
Retention7 days30 days

For Docker builds, the Depot Runners can also be integrated seamlessly with Depot Docker builders. They live right next to each other, so customers using both benefit from data locality and faster network transfers over private network links rather than over the public internet.

For example, it's easy to load Docker images built on Depot back into a GitHub Actions workflow that uses Depot's runners to run docker compose up, for situations like running integration tests against a newly built Docker image.

Where to next with GitHub Actions Runners and caching

With these speed improvements, which allow your Docker caching to be far faster than with the default GitHub Actions cache, we are hoping to unlock a new level of build acceleration with very little effort required.

We're planning to add additional functionality to Depot Runners. For example, we recently announced the public beta of ARM runners, which can drastically speed up builds that target ARM platforms. We've also released upgrades to disk space so that larger runners get access to larger disk sizes right out of the box.

If you'd like to give Depot Runners a try, create a Depot account (it's free for 7 days), and then change the runs-on parameter from ubuntu-22.04 to depot-ubuntu-22.04 to use a Depot GitHub Actions runner in a GitHub Actions workflow:

jobs:
  build:
    name: Build
-    runs-on: ubuntu-22.04
+    runs-on: depot-ubuntu-22.04
    steps:
      ...

You can also check out our docs for more details.

kyle

Kyle Galbraith

CEO & Co-founder of Depot

Platform Engineer who despises slow builds turned founder. Expat living in 🇫🇷

联系我们 contact @ memedata.com