Programming, philosophy, pedaling.
Aug 14, 2025 Tags: devblog, programming, rust, zizmor
I recently solved an interesting problem inside zizmor
with a type of state machine/automaton I hadn’t used before: a
finite state transducer (FST).
This is just a quick write-up of the problem and how I solved it. It doesn’t
go particularly deep into the data structures themselves. For more information
on FSTs themselves, I strongly recommend burntsushi’s article on transducers
(which is what actually led me to his fst
crate).
TL;DR: I used the fst
crate to build a finite state transducer that
maps GitHub Actions context patterns to their logical “capability” in
the context of a potential template injection vulnerability. This ended up
being an order of magnitude smaller in terms of representation (~14.5KB instead
of ~240 KB) and faster and more memory efficient than my naïve
initial approaches (tables and prefix trie walks). It also enabled me
to fully precompute the FST at compile time, eliminating the startup cost
of priming a trie- or table-based map.
Background
zizmor
is a static analysis tool for GitHub Actions.
One of the categories of weaknesses it can find is template injections, wherein the CI author uses a GitHub Actions expression in a shell or similar context without realizing that the expression can escape any shell-level quoting intended to “defuse” it.
Here’s an example, derived from a pattern that gets exploited over and over again:
1
2
3
- name: "Print the current ref"
run: |
echo "The current ref is: ${{ github.ref }}"
If this step is part of a workflow that grants elevated privileges to third
parties (like pull_request_target
), and attacker can contrive a git
ref that escapes the shell quoting and runs arbitrary code.
For example, the following ref:
1
innocent";cat${IFS}/etc/passwd;true${IFS}"
…would expand as:
1
echo "The current ref is innocent";cat /etc/passwd;true ""
Fortunately, zizmor
detects these:
1
2
3
4
5
6
7
8
9
10
11
12
zizmor hackme.yml
error[template-injection]: code injection via template expansion
--> hackme.yml:15:41
|
14 | run: |
| ^^^ this run block
15 | echo "The current ref is: ${{ github.ref }}"
| ^^^^^^^^^^ may expand into attacker-controllable code
|
= note: audit confidence → High
= note: this finding has an auto-fix
The problem
There’s a very simple way to detect these vulnerabilities: we could walk every
code “sink” in a given workflow (e.g. run:
blocks, action inputs that are
known to contain code, &c.) and look for the fences of an expression
(${{ ... }}
). If we see those fences, we know that the contents
are a potential injection risk.
This is appealing for reasons of simplicity, but is unacceptably noisy:
-
There are many actions expressions that are trivially safe, or non-trivial but deductively safe:
- Literals, e.g.
${{ true }}
or${{ 'lol' }}
; -
Any expression that can only expand to a literal:
1 2 3
# only ever expands to 'main' or 'not-main' # despite using the github.ref context ${{ github.ref == 'main' && 'main' || 'not-main' }}
-
Any expression that can’t expand to meaningful code, e.g. due to the expression’s type:
1 2 3 4 5
# only ever expands to a number ${{ github.run_number }} # only ever expands to `true` or `false` ${{ github.ref == 'main' }}
- Literals, e.g.
-
There are many expressions that can appear unsafe by virtue of dataflow or context expansion, but are actually safe because of the context’s underlying type or constraints:
${{ github.event.pull_request.merged }}
is populated by GitHub’s backend and can only expand totrue
orfalse
, but requires us to know a priori that it’s a “safe” context;${{ github.actor }}
is an arbitrary string, but is limited in structure to characters that make it infeasible to perform a useful injection with (no semicolons,$
, &c.).
zizmor
generally aims to present low-noise findings, so filtering these
out by default is paramount.
The first group is pretty easy: we can do a small amount of dataflow analysis to determine whether an expression’s evaluation is “tainted” by arbitrary controllable inputs.
The second group is harder, because it requires to know additional facts about arbitrary-looking contexts. The two main facts we care about are type (whether a context expands to a string, a number, or something else) and capability (whether the expansion is fully arbitrary, or constrained in some manner that might make it safe or at least less risky). In practice these both collapse down to capability, since we can categorize certain types (e.g. booleans and numbers) as inherently safe.
Fact finding
So, what we want is a way to collect facts about every valid GitHub Actions context.
The trick to this lies in remembering that, under the hood, GitHub Actions is driven by GitHub’s webhooks API: most of the context state loaded into a GitHub Actions workflow run is derived from the webhook payload corresponding to the event that triggered the workflow.
So, how do we get a list of all valid contexts along with information about their expansion? GitHub doesn’t provide this directly, but we can derive it from their OpenAPI specification for the webhooks API.
This comes in the form of a ~4.5MB OpenAPI schema, which is pain in the ass
to work with directly: it’s both heavily self-referential (by necessity,
since an “unrolled” version with inline schemas for each property would
infeasibly large), is heavily telescoped (also by necessity, since
GitHub’s API responses themselves are not particularly flat), and makes ample
use of OpenAPI constructions like oneOf
, anyOf
, and allOf
that require
careful additional handling.
At the bottom of all of this, however, is our reward: detailed information about every property provided by each webhook event, including the property’s type and valuable information about how the property is constrained.
For example, here’s the schema for a pull_request.state
property:
1
2
3
4
5
6
"state": {
"description": "State of this Pull Request. Either `open` or `closed`.",
"enum": ["open", "closed"],
"type": "string",
"examples": ["open"]
}
This tells us that pull_request.state
is a string, but that its value
is constrained to either open
or closed
. We categorize this as having
a “fixed” capability, since we know that the attacker can’t control the
structure of the value itself in a meaningful way.
Long story short: this is implemented as a helper script within zizmor
called webhooks-to-contexts.py
. This script is run periodically in GitHub
Actions and walks the OpenAPI scheme to produces a CSV,
context-capabilities.csv
, that looks like this:
context | capability |
---|---|
github.event.pull_request.active_lock_reason | arbitrary |
github.event.pull_request.additions | fixed |
github.event.pull_request.allow_auto_merge | fixed |
github.event.pull_request.allow_update_branch | fixed |
github.event.pull_request.assignee | fixed |
github.event.pull_request.assignee.avatar_url | structured |
github.event.pull_request.assignee.deleted | fixed |
github.event.pull_request.assignee.email | arbitrary |
github.event.pull_request.assignee.events_url | arbitrary |
github.event.pull_request.assignee.followers_url | structured |
github.event.pull_request.assignee.following_url | arbitrary |
github.event.pull_request.assignee.gists_url | arbitrary |
github.event.pull_request.assignee.gravatar_id | arbitrary |
github.event.pull_request.assignee.html_url | structured |
…to the tune of about 4000 unique contexts.
The bigger problem
So, we have a few thousand contexts, each with a capability that tells us how much of a risk that context poses in terms of template injection. We can just shove these into a map and call it a day, right?
Wrong. We’ve glossed over a significant wrinkle, which is that context accesses in GitHub Actions are not themselves always literal. Instead, they can be patterns that can expand to multiple values.
A good example of this is github.event.pull_request.labels
: labels
is
an array of objects, each of which has a name
property corresponding
to the label’s actual name. To access these, we can use syntaxes that select
individual labels or all labels:
1
2
3
4
5
# access the first label's name
github.event.pull_request.labels[0].name
# access all labels' names
github.event.pull_request.labels.*.name
In both cases, we want to apply the same capability to the context’s expansion.
To make things even more complicated exciting, GitHub’s own context
access syntax is surprisingly malleable: each of the following is a valid
and equivalent way to access the first label’s name:
1
2
3
4
5
github.event.pull_request.labels[0].name
github.EVENT.PULL_REQUEST.LABELS[0].NAME
github['event']['pull_request']['labels'][0]['name']
github['event'].pull_request['labels'][0].name
github.event.pULl_ReQUEst['LaBEls'][0].nAmE
..and so forth.
In sum, we have two properties that blow a hole in our “just shove it in a map” approach:
- Contexts are patterned, and can’t be expanded into a static finite
enumeration of simplified contexts. For example, we can’t know how many
labels a repository has, so we can’t statically unfold
the
github.event.pull_request.labels.*.name
context intoN
contexts that match everything the user might write in a workflow. - Contexts can be expressed through multiple syntaxes. We can keep things
simple on the capability extraction side by only using the
jq
-ish syntax, but we still need to normalize any contexts as they appear in workflows. This is not very difficult, but it makes a single map lookup even less appealing.
The solution
To recap:
- We have a few thousands contexts, each of which is really a pattern that can match one or more “concrete” context usages as they appear in a user’s workflow.
- Each of these context patterns has an associated
capability, as one of
fixed | structured | arbitrary
, indicating how much of a risk the context poses in terms of template injection. - Our goal is to efficiently match these patterns against the contexts as they appear in a user’s workflow, and return the associated capability.
To me, this originally smacked of a prefix/radix trie problem: there are a a large number of common prefixes/segments in the pattern set, meaning that the trie could be made relatively compact. However, tries are optimized for operations that the problem doesn’t require:
- Tries are optimized to grow and shrink at runtime, but we don’t need that: the set of context patterns is static, and we’d ideally trade off some compile-time cost for runtime size and speed.
- Tries can perform efficient prefix and exact matches, but (typically) at the cost of a larger runtime memory footprint.
Finally, on a more practical level: I couldn’t find a great trie/radix trie
crate to use. Some of this might have been a discovery failure on my part,
but I couldn’t find one that was already widely used and still actively
maintained. radix_tree
came the closest, but hasn’t been updated in nearly
5 years.
While reading about other efficient prefix representation structures, I came across DAFSAs (also sometimes called DAWGs). These offer a significantly more compact representation of prefixes than a trie, but at a cost: unlike a trie, a DAFSA cannot contain auxiliary data. This makes them great for inclusion checking (including of prefixes), but not so great for my purpose of storing each pattern’s associated capability.
That brought me to transducers as a class of finite state machines: unlike acceptors (DAFSAs, but also normal DFAs and NFAs) that map from an input sequence to a boolean accept/reject state, transducers map an input sequence to an output sequence. That output sequence can then be composed (e.g. via summation) into an output value. In effect, the “path” an input takes through the transducer yields an output value.
In this way, FSTs can behave a lot like a map (whether backed by a prefix trie or a hash table), but with some appealing additional properties:
- A finite state transducer can compress its entire input, unlike a prefix trie (which only compresses the prefixes). That means a more compact representation.
- A finite state transducer can share duplicated output values across inputs, unlike a prefix trie or hash table (which would store the same output value multiple times). This also means a more compact representation, and is particularly appealing in our case as we only have a small set of possible capabilities.
These desirable properties come with downsides too: optimal FST construction requires memory proportional to the total input size, and requires ordered insertion of each input. Modifications to an FST are also limited: optimal insertions must be ordered, while deletions or changes to an associated value require a full rebuild of the FST. In practice, this that FST construction is a static affair over a preprocessed input set. But that’s perfectly fine for my use case!
Putting it together
As it turns out, using the fst
crate to construct an FST at build time
is pretty simple. Here’s the totality of the code that I put in
build.rs
to transform context-capabilities.csv
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
fn do_context_capabilities() {
let manifest_dir = env::var("CARGO_MANIFEST_DIR").unwrap();
let source = Path::new(&manifest_dir).join("data/context-capabilities.csv");
println!(
"cargo::rerun-if-changed={source}",
source = source.display()
);
let out_dir = env::var("OUT_DIR").unwrap();
let out_path = Path::new(&out_dir).join("context-capabilities.fst");
let out = io::BufWriter::new(File::create(out_path).unwrap());
let mut build = fst::MapBuilder::new(out).unwrap();
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.from_path(source)
.unwrap();
for record in rdr.records() {
let record = record.unwrap();
let context = record.get(0).unwrap();
let capability = match record.get(1).unwrap() {
"arbitrary" => 0,
"structured" => 1,
"fixed" => 2,
_ => panic!("Unknown capability"),
};
build.insert(context, capability).unwrap();
}
build.finish().unwrap();
}
…and then, loading and querying it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static CONTEXT_CAPABILITIES_FST: LazyLock<Map<&[u8]>> = LazyLock::new(|| {
fst::Map::new(include_bytes!(concat!(env!("OUT_DIR"), "/context-capabilities.fst")).as_slice())
.expect("couldn't initialize context capabilities FST")
});
impl Capability {
fn from_context(context: &str) -> Option<Self> {
match CONTEXT_CAPABILITIES_FST.get(context) {
Some(0) => Some(Capability::Arbitrary),
Some(1) => Some(Capability::Structured),
Some(2) => Some(Capability::Fixed),
Some(_) => unreachable!("unexpected context capability"),
_ => None,
}
}
}
(where the input context
has been normalized, e.g. from
github["event"]["pull_request"]["title"]
to
github.event.pull_request.title
).
Concluding thoughts
Everything above was included in zizmor 1.9.0, as part of a large-scale refactor of the template-injection audit.
Was this overkill? Probably: I only have about ~4000 valid context patterns, which would have comfortably fit into a hash table.
However, using an FST for this makes the footprint ludicrously and satisfyingly
small: each pattern takes less than 4 bytes to represent in the serialized FST,
well below the roughly linear memory footprint of loading the equivalent
data from context-capabilities.csv
.
Using an FST also unlocked future optimizations ideas that I haven’t bothered to experiment with yet:
- The FST is currently searched using a normalization of each context as it appears in a workflow. However, FSTs can be searched by any DFA, meaning that I could in theory convert each context into a regular expression instead. I’m unclear on whether this would bring a performance advantage, since the context-to-regex-to-DFA conversion itself is not necessarily cheap.
- In principle, the FST’s size could be squeezed down even further by
splitting the context patterns into segments, rather than decomposing
into sequences of bytes. This comes from the observation that many vertices
in the FST are shared and singular, meaning that they only have one
incoming edge and one outgoing edge. I don’t think the
fst
crate supports this natively, but it would yield the prefix deduplication benefits of a prefix trie while still preserving the compression benefits of an FST.