有限状态转换器乐趣

有限状态转换器乐趣
Fun with Finite State Transducers

原始链接: https://blog.yossarian.net/2025/08/14/Fun-with-finite-state-transducers

## 有限状态转换器在 Zizmor 中的改进本文详细介绍了 Zizmor 静态分析工具在 GitHub Actions 中进行的性能优化，具体涉及模板注入漏洞检测。Zizmor 识别 GitHub Actions 表达式脱离预期 shell 引用的潜在漏洞利用。最初，这涉及对表达式分隔符 (`${{ ... }}`) 的噪声检查。为了减少误报，Zizmor 需要理解每个 GitHub Actions 上下文（例如 `github.ref`）的“能力”——安全级别。解决方案是利用使用 Rust 中的 `fst` crate 构建的有限状态转换器 (FST)。与其使用映射或 trie，FST 从 GitHub 的 OpenAPI 规范派生的 CSV 文件构建，将上下文模式映射到其能力（任意、结构化或固定）。这种方法带来了显著的好处：表示大小大幅减少（~14.5KB 对 ~240KB），速度和内存效率提高，并且能够在编译时预计算 FST，从而消除了运行时初始化成本。FST 有效处理上下文模式和语法变化，为模板注入分析期间确定上下文安全性提供了一种紧凑且高性能的解决方案。虽然对于当前数据集大小来说可能有些过度，但 FST 释放了未来的优化可能性，例如基于正则表达式的搜索和进一步的压缩技术。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录用有限状态转换器玩转 (yossarian.net) 18 分，woodruffw 发表于 5 小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

Fun with finite state transducers

Programming, philosophy, pedaling.

Aug 14, 2025 Tags: devblog, programming, rust, zizmor

I recently^{solved an interesting problem inside zizmor
with a type of state machine/automaton I hadn’t used before: a
finite state transducer (FST).}

This is just a quick write-up of the problem and how I solved it. It doesn’t go particularly deep into the data structures themselves. For more information on FSTs themselves, I strongly recommend burntsushi’s article on transducers (which is what actually led me to his fst crate).

TL;DR: I used the fst crate to build a finite state transducer that maps GitHub Actions context patterns to their logical “capability” in the context of a potential template injection vulnerability. This ended up being an order of magnitude smaller in terms of representation (~14.5KB instead of ~240 KB) and faster and more memory efficient than my naïve initial approaches^{(tables and prefix trie walks). It also enabled me
to fully precompute the FST at compile time, eliminating the startup cost
of priming a trie- or table-based map.}

Background

zizmor is a static analysis tool for GitHub Actions.

One of the categories of weaknesses it can find is template injections, wherein the CI author uses a GitHub Actions expression in a shell or similar context without realizing that the expression can escape any shell-level quoting intended to “defuse” it.

Here’s an example, derived from a pattern that gets exploited over and over again:

- name: "Print the current ref"
  run: |
    echo "The current ref is: ${{ github.ref }}"

If this step is part of a workflow that grants elevated privileges to third parties (like pull_request_target), and attacker can contrive a git ref that escapes the shell quoting and runs arbitrary code.

For example, the following ref:

innocent";cat${IFS}/etc/passwd;true${IFS}"

…would expand^as:

echo "The current ref is innocent";cat /etc/passwd;true ""

Fortunately, zizmor detects these:

zizmor hackme.yml

error[template-injection]: code injection via template expansion
  --> hackme.yml:15:41
   |
14 |         run: |
   |         ^^^ this run block
15 |           echo "The current ref is: ${{ github.ref }}"
   |                                         ^^^^^^^^^^ may expand into attacker-controllable code
   |
   = note: audit confidence → High
   = note: this finding has an auto-fix

The problem

There’s a very simple way to detect these vulnerabilities: we could walk every code “sink” in a given workflow (e.g. run: blocks, action inputs that are known to contain code, &c.) and look for the fences of an expression (${{ ... }}). If we see those fences, we know that the contents are a potential injection risk.

This is appealing for reasons of simplicity, but is unacceptably noisy:

There are many actions expressions that are trivially safe, or non-trivial but deductively safe:

Literals, e.g. ${{ true }} or ${{ 'lol' }};

Any expression that can only expand to a literal:

# only ever expands to 'main' or 'not-main'
# despite using the github.ref context
${{ github.ref == 'main' && 'main' || 'not-main' }}

Any expression that can’t expand to meaningful code, e.g. due to the expression’s type:

# only ever expands to a number
${{ github.run_number }}

# only ever expands to `true` or `false`
${{ github.ref == 'main' }}

There are many expressions that can appear unsafe by virtue of dataflow or context expansion, but are actually safe because of the context’s underlying type or constraints:
- ${{ github.event.pull_request.merged }} is populated by GitHub’s backend and can only expand to true or false, but requires us to know a priori that it’s a “safe” context;
- ${{ github.actor }} is an arbitrary string, but is limited in structure to characters that make it infeasible to perform a useful injection with (no semicolons, $, &c.).

zizmor generally aims to present low-noise findings, so filtering these out by default is paramount.

The first group is pretty easy: we can do a small amount of dataflow analysis to determine whether an expression’s evaluation is “tainted” by arbitrary controllable inputs.

The second group is harder, because it requires to know additional facts about arbitrary-looking contexts. The two main facts we care about are type (whether a context expands to a string, a number, or something else) and capability (whether the expansion is fully arbitrary, or constrained in some manner that might make it safe or at least less risky). In practice these both collapse down to capability, since we can categorize certain types (e.g. booleans and numbers) as inherently safe.

Fact finding

So, what we want is a way to collect facts about every valid GitHub Actions context.

The trick to this lies in remembering that, under the hood, GitHub Actions is driven by GitHub’s webhooks API: most^{of the context state loaded
into a GitHub Actions workflow run is derived from the webhook payload
corresponding to the event that triggered the workflow.}

So, how do we get a list of all valid contexts along with information about their expansion? GitHub doesn’t provide this directly, but we can derive it from their OpenAPI specification for the webhooks API.

This comes in the form of a ~4.5MB OpenAPI schema, which is pain in the ass to work with directly^{: it’s both heavily self-referential (by necessity,
since an “unrolled” version with inline schemas for each property would
infeasibly large), is heavily telescoped (also by necessity, since
GitHub’s API responses themselves are not particularly flat), and makes ample
use of OpenAPI constructions like oneOf, anyOf, and allOf that require
careful additional handling.}

At the bottom of all of this, however, is our reward: detailed information about every property provided by each webhook event, including the property’s type and valuable information about how the property is constrained.

For example, here’s the schema for a pull_request.state property:

"state": {
    "description": "State of this Pull Request. Either `open` or `closed`.",
    "enum": ["open", "closed"],
    "type": "string",
    "examples": ["open"]
}

This tells us that pull_request.state is a string, but that its value is constrained to either open or closed. We categorize this as having a “fixed” capability, since we know that the attacker can’t control the structure of the value itself in a meaningful way.

Long story short: this is implemented as a helper script within zizmor called webhooks-to-contexts.py. This script is run periodically in GitHub Actions and walks the OpenAPI scheme to produces a CSV, context-capabilities.csv, that looks like this:

context	capability
github.event.pull_request.active_lock_reason	arbitrary
github.event.pull_request.additions	fixed
github.event.pull_request.allow_auto_merge	fixed
github.event.pull_request.allow_update_branch	fixed
github.event.pull_request.assignee	fixed
github.event.pull_request.assignee.avatar_url	structured
github.event.pull_request.assignee.deleted	fixed
github.event.pull_request.assignee.email	arbitrary
github.event.pull_request.assignee.events_url	arbitrary
github.event.pull_request.assignee.followers_url	structured
github.event.pull_request.assignee.following_url	arbitrary
github.event.pull_request.assignee.gists_url	arbitrary
github.event.pull_request.assignee.gravatar_id	arbitrary
github.event.pull_request.assignee.html_url	structured

…to the tune of about 4000 unique^contexts.

The bigger problem

So, we have a few thousand contexts, each with a capability that tells us how much of a risk that context poses in terms of template injection. We can just shove these into a map and call it a day, right?

Wrong. We’ve glossed over a significant wrinkle, which is that context accesses in GitHub Actions are not themselves always literal. Instead, they can be patterns that can expand to multiple values.

A good example of this is github.event.pull_request.labels: labels is an array of objects, each of which has a name property corresponding to the label’s actual name. To access these, we can use syntaxes that select individual labels or all labels:

# access the first label's name
github.event.pull_request.labels[0].name

# access all labels' names
github.event.pull_request.labels.*.name

In both cases, we want to apply the same capability to the context’s expansion. To make things even more ~~complicated~~ exciting, GitHub’s own context access syntax is surprisingly malleable: each of the following is a valid and equivalent way to access the first label’s name:

github.event.pull_request.labels[0].name
github.EVENT.PULL_REQUEST.LABELS[0].NAME
github['event']['pull_request']['labels'][0]['name']
github['event'].pull_request['labels'][0].name
github.event.pULl_ReQUEst['LaBEls'][0].nAmE

..and so forth.

In sum, we have two properties that blow a hole in our “just shove it in a map” approach:

Contexts are patterned, and can’t be expanded into a static finite enumeration of simplified contexts. For example, we can’t know how many labels a repository has^{, so we can’t statically unfold
the github.event.pull_request.labels.*.name context into N contexts
that match everything the user might write in a workflow.}
Contexts can be expressed through multiple syntaxes. We can keep things simple on the capability extraction side by only using the jq-ish syntax, but we still need to normalize any contexts as they appear in workflows. This is not very difficult, but it makes a single map lookup even less appealing.

The solution

To recap:

We have a few thousands contexts, each of which is really a pattern that can match one or more “concrete” context usages as they appear in a user’s workflow.
Each of these context patterns has an associated capability, as one of fixed | structured | arbitrary, indicating how much of a risk the context poses in terms of template injection.
Our goal is to efficiently match these patterns against the contexts as they appear in a user’s workflow, and return the associated capability.

To me, this originally smacked of a prefix/radix trie problem: there are a a large number of common prefixes/segments in the pattern set, meaning that the trie could be made relatively compact. However, tries are optimized for operations that the problem doesn’t require:

Tries are optimized to grow and shrink at runtime, but we don’t need that: the set of context patterns is static, and we’d ideally trade off some compile-time cost for runtime size and speed.
Tries can perform efficient prefix and exact matches, but (typically) at the cost of a larger runtime memory footprint.

Finally, on a more practical level: I couldn’t find a great trie/radix trie crate to use. Some of this might have been a discovery failure on my part, but I couldn’t find one that was already widely used and still actively maintained. radix_tree came the closest, but hasn’t been updated in nearly 5 years.

While reading about other efficient prefix representation structures, I came across DAFSAs (also sometimes called DAWGs). These offer a significantly more compact representation of prefixes than a trie, but at a cost: unlike a trie, a DAFSA cannot contain auxiliary data. This makes them great for inclusion checking (including of prefixes), but not so great for my purpose of storing each pattern’s associated capability.

That brought me to transducers as a class of finite state machines: unlike acceptors (DAFSAs, but also normal DFAs and NFAs) that map from an input sequence to a boolean accept/reject state, transducers map an input sequence to an output sequence. That output sequence can then be composed (e.g. via summation) into an output value. In effect, the “path” an input takes through the transducer yields an output value.

In this way, FSTs can behave a lot like a map (whether backed by a prefix trie or a hash table), but with some appealing additional properties:

A finite state transducer can compress its entire input, unlike a prefix trie (which only compresses the prefixes). That means a more compact representation.
A finite state transducer can share duplicated output values across inputs, unlike a prefix trie or hash table (which would store the same output value multiple times). This also means a more compact representation, and is particularly appealing in our case as we only have a small set of possible capabilities.

These desirable properties come with downsides too: optimal FST construction requires memory proportional to the total input size, and requires ordered insertion of each input. Modifications to an FST are also limited: optimal insertions must be ordered, while deletions or changes to an associated value require a full rebuild of the FST. In practice, this that FST construction is a static affair over a preprocessed input set. But that’s perfectly fine for my use case!

Putting it together

As it turns out, using the fst crate to construct an FST at build time is pretty simple. Here’s the totality of the code that I put in build.rs to transform context-capabilities.csv:

fn do_context_capabilities() {
    let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap();
    let source = Path::new(&manifest_dir).join("data/context-capabilities.csv");

    println!(
        "cargo::rerun-if-changed={source}",
        source = source.display()
    );

    let out_dir = env::var("OUT_DIR").unwrap();
    let out_path = Path::new(&out_dir).join("context-capabilities.fst");

    let out = io::BufWriter::new(File::create(out_path).unwrap());
    let mut build = fst::MapBuilder::new(out).unwrap();

    let mut rdr = csv::ReaderBuilder::new()
        .has_headers(false)
        .from_path(source)
        .unwrap();

    for record in rdr.records() {
        let record = record.unwrap();
        let context = record.get(0).unwrap();
        let capability = match record.get(1).unwrap() {
            "arbitrary" => 0,
            "structured" => 1,
            "fixed" => 2,
            _ => panic!("Unknown capability"),
        };

        build.insert(context, capability).unwrap();
    }

    build.finish().unwrap();
}

…and then, loading and querying it:

static CONTEXT_CAPABILITIES_FST: LazyLock<Map<&[u8]>> = LazyLock::new(|| {
    fst::Map::new(include_bytes!(concat!(env!("OUT_DIR"), "/context-capabilities.fst")).as_slice())
        .expect("couldn't initialize context capabilities FST")
});


impl Capability {
    fn from_context(context: &str) -> Option<Self> {
        match CONTEXT_CAPABILITIES_FST.get(context) {
            Some(0) => Some(Capability::Arbitrary),
            Some(1) => Some(Capability::Structured),
            Some(2) => Some(Capability::Fixed),
            Some(_) => unreachable!("unexpected context capability"),
            _ => None,
        }
    }
}

(where the input context has been normalized, e.g. from github["event"]["pull_request"]["title"] to github.event.pull_request.title).

Concluding thoughts

Everything above was included in zizmor 1.9.0, as part of a large-scale refactor of the template-injection audit.

Was this overkill? Probably: I only have about ~4000 valid context patterns, which would have comfortably fit into a hash table.

However, using an FST for this makes the footprint ludicrously and satisfyingly small: each pattern takes less than 4 bytes to represent in the serialized FST, well below the roughly linear memory footprint of loading the equivalent data from context-capabilities.csv.

Using an FST also unlocked future optimizations ideas that I haven’t bothered to experiment with yet:

The FST is currently searched using a normalization of each context as it appears in a workflow. However, FSTs can be searched by any DFA, meaning that I could in theory convert each context into a regular expression instead. I’m unclear on whether this would bring a performance advantage, since the context-to-regex-to-DFA conversion itself is not necessarily cheap.
In principle, the FST’s size could be squeezed down even further by splitting the context patterns into segments, rather than decomposing into sequences of bytes. This comes from the observation that many vertices in the FST are shared and singular, meaning that they only have one incoming edge and one outgoing edge. I don’t think the fst crate supports this natively, but it would yield the prefix deduplication benefits of a prefix trie while still preserving the compression benefits of an FST.