我在GitHub Actions中运行的是谁的代码?
Whose code am I running in GitHub Actions?

原始链接: https://alexwlchan.net/2025/github-actions-audit/

近期tj-actions/changed-files GitHub Action 的安全漏洞突显了在工作流中使用可变引用(标签)的风险。攻击者注入了恶意代码,将密钥泄露到公共构建日志中。虽然看似指向特定版本(例如,v2),但标签是可变的,可以更改为指向受损提交。 原文作者提供了一个脚本,用于识别正在使用可变引用的GitHub Actions。该脚本使用`find`、`xargs`、`grep`、`sed`、`tr`、`awk`、`sort`和`uniq`来解析工作流文件,提取操作名称并统计其使用情况。 为了减轻此风险,请考虑使用不可变引用,为每个操作指定确切的 Git 提交 ID。仔细评估操作作者的可信度,优先选择来自大型组织且安全性可靠的操作。对于小型功能,编写自定义脚本而不是依赖外部操作可以进一步降低风险并提高控制力。文章鼓励开发者熟悉 Unix 文本处理工具,以便创建用于数据分析的自定义脚本。

这篇 Hacker News 讨论帖关注的是使用第三方 GitHub Actions 的安全隐患,起因是最近 `Tj-actions/changed-files` action 被入侵。alexwlchan.net 上的原文可能建议使用 SHAs(特定的提交 ID)来锁定 actions 以防止恶意更新。 然而,用户指出了一些限制:简短的提交 ID 并不可靠,即使是完整的 SHA 锁定也不能保证安全,因为依赖项本身可能没有锁定其依赖项。一位用户提到 GitHub Actions 实际上强制使用完整的提交 SHA。一些用户建议编写自己的驱动程序,而不是依赖许多外部 actions,从而最大限度地减少攻击面。另一些用户仅使用 Actions 来触发自定义 webhook,在他们自己的服务器上处理实际的构建过程,从而避免使用 YAML。一位用户强调了 Actions 被滥用于网页抓取的情况。总体共识倾向于将 GitHub Actions 视为潜在的受损目标,并限制其对 AWS 账户等敏感资源的访问。

原文

A week ago, somebody added malicious code to the tj-actions/changed-files GitHub Action. If you used the compromised action, it would leak secrets to your build log. Those build logs are public for public repositories, so anybody could see your secrets. Scary!

Mutable vs immutable references

This attack was possible because it’s common practice to refer to tags in a GitHub Actions workflow, for example:

jobs:
  changed_files:
    ...
    steps:
      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v2
      ...

At a glance, this looks like an immutable reference to an already-released “version 2” of this action, but actually this is a mutable Git tag. If somebody changes the v2 tag in the tj-actions/changed-files repo to point to a different commit, this action will run different code the next time it runs.

If you specify a Git commit ID instead (e.g. a5b3abf), that’s an immutable reference that will run the same code every time.

Tags vs commit IDs is a tradeoff between convenience and security. Specifying an exact commit ID means the code won’t change unexpectedly, but tags are easier to read and compare.

Do I have any mutable references?

I wasn’t worried about this particular attack because I don’t use tj-actions, but I was curious about what other GitHub Actions I’m using. I ran a short shell script in the folder where I have local clones of all my repos:

find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0 \
  | xargs -0 grep --no-filename "uses:" \
  | sed 's/\- uses:/uses:/g' \
  | tr '"' ' ' \
  | awk '{print $2}' \
  | sed 's/\r//g' \
  | sort \
  | uniq --count \
  | sort --numeric-sort

This prints a tally of all the actions I’m using. Here’s a snippet of the output:

 1 hashicorp/setup-terraform@v3
 2 dtolnay/rust-toolchain@v1
 2 taiki-e/create-gh-release-action@v1
 2 taiki-e/upload-rust-binary-action@v1
 4 actions/setup-python@v4
 6 actions/cache@v4
 9 ruby/setup-ruby@v1
31 actions/setup-python@v5
58 actions/checkout@v4

I went through the entire list and thought about how much I trust each action and its author.

  • Is it from a large organisation like actions or ruby? They’re not perfect, but they’re likely to have good security procedures in place to protect against malicious changes.

  • Is it from an individual developer or small organisation? Here I tend to be more wary, especially if I don’t know the author personally. That’s not to say that individuals can’t have good security, but there’s more variance in the security setup of random developers on the Internet than among big organisations.

  • Do I need to use somebody else’s action, or could I write my own script to replace it? This is what I generally prefer, especially if I’m only using a small subset of the functionality offered by the action. It’s a bit more work upfront, but then I know exactly what it’s doing and there’s less churn and risk from upstream changes.

I feel pretty good about my list. Most of my actions are from large organisations, and the rest are a few actions specific to my Rust command-line tools which are non-critical toys, where the impact of a compromised GitHub repo would be relatively slight.

How this script works

This is a classic use of Unix pipelines, where I’m chaining together a bunch of built-in text processing tools. Let’s step through how it works.

find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0

This looks for any GitHub Actions workflow file – any file whose name ends with .yml in a folder like .github/workflows/. It prints a list of filenames, like:

./alexwlchan.net/.github/workflows/build_site.yml
./books.alexwlchan.net/.github/workflows/build_site.yml
./concurrently/.github/workflows/main.yml

It prints them with a null byte (\0) between them, which makes it possible to split the filenames in the next step. By default it uses a newline, but a null byte is a bit safer, in case you have filenames which include newline characters.

I know that I always use .yml as a file extension, but if you sometimes use .yaml, you can replace -name '*.yml' with \( -name '*.yml' -o -name '*.yaml' \)

I have a bunch of local repos that are clones of open-source projects, and not my code, so I care less about what GitHub Actions they’re using. I excluded them by adding extra -path rules, like -not -path './cpython/*'.

xargs -0 grep --no-filename "uses:"

Then we use xargs to go through the filenames one-by-one. The `-0` flag tells it to split on the null byte, and then it runs grep to look for lines that include "uses:" – this is how you use an action in your workflow file.

The --no-filename option means this just prints the matching line, and not the name of the file it comes from. Not all of my files are formatted or indented consistently, so the output is quite messy:

    - uses: actions/checkout@v4
        uses: "actions/cache@v4"
      uses: ruby/setup-ruby@v1

sed 's/\- uses:/uses:/g' \

Sometimes there's a leading hyphen, sometimes there isn’t – it depends on whether uses: is the first key in the YAML dictionary. This sed command replaces "- uses:" with "uses:" to start tidying up the data.

    uses: actions/checkout@v4
        uses: "actions/cache@v4"
      uses: ruby/setup-ruby@v1

I know sed is a pretty powerful tool for making changes to text, but I only know a couple of simple commands, like this pattern for replacing text: sed 's/old/new/g'.

tr '"' ' '

Sometimes the name of the action is quoted, sometimes it isn’t. This command removes any double quotes from the output.

    uses: actions/checkout@v4
        uses: actions/cache@v4
      uses: ruby/setup-ruby@v1

Now I’m writing this post, it occurs to me I could use sed to make this substitution as well. I reached for tr because I've been using it for longer, and the syntax is simpler for doing single character substitutions: tr '<oldchar>' '<newchar>'

awk '{print $2}'

This splits the string on spaces, and prints the second token, which is the name of the action:

actions/checkout@v4
actions/cache@v4
ruby/setup-ruby@v1

awk is another powerful text utility that I’ve never learnt properly – I only know how to print the nth word in a string. It has a lot of pattern-matching features I’ve never tried.

sed 's/\r//g'

I had a few workflow files which were using carriage returns (\r), and those were included in the awk output. This command gets rid of them, which makes the data more consistent for the final step.

sort | uniq --count | sort --numeric-sort

This sorts the lines so identical lines are adjacent, then it groups and counts the lines, and finally it re-sorts to put the most frequent lines at the bottom.

I have this as a shell alias called tally.

   6 actions/cache@v4
   9 ruby/setup-ruby@v1
  59 actions/checkout@v4

This step-by-step approach is how I build Unix text pipelines: I can write a step at a time, and gradually refine and tweak the output until I get the result I want. There are lots of ways to do it, and because this is a script I’ll use once and then discard, I don’t have to worry too much about doing it in the “purest” way – as long as it gets the right result, that’s good enough.

If you use GitHub Actions, you might want to use this script to check your own actions, and see what you’re using. But more than that, I recommend becoming familiar with the Unix text processing tools and pipelines – even in the age of AI, they’re still a powerful and flexible way to cobble together one-off scripts for processing data.

联系我们 contact @ memedata.com