On December 30th, while most of us were preparing for a New Year’s Eve celebration, the US Treasury was prepping a notice to lawmakers to notify them that their systems, which (obviously) contain highly sensitive, confidential data, had been compromised.
(Honestly, I’m not sure how I missed this news. Usually I’m pretty plugged in, especially to, like, open-source software vulnerabilities that affect my government’s treasury department. 🤷♂️)
Out of compliance, the US Treasury posted this notice to US lawmakers, breaking the news that a “China state-sponsored Advanced Persistent Threat (APT) actor” had breached their systems.
And that’s not even the craziest part! Wait till I tell you how they did it!
Well, I’m not going to keep it a secret. It was good, ol’ SQL injection. (More on SQL injection in a little bit.)
These US Treasury servers were/are protected, in part, by a Privileged Access Management (PAM) tool from Beyond Trust (which, I must say, is a fantastic name for a security company). Unfortunately for them (and, well, for all of us with records at the US Treasury (😅)), it would be Beyond Trust that would have to break the news to the government, as it was their software that served as the entry point for the hackers.
But let’s not all dogpile on and throw tomatoes at Beyond Trust. This could have been any of us. Because at the core of the vulnerability was PostgreSQL–one of the most commonly used relational databases in the world.
With one caveat—it’s not exactly a blatant SQL injection vulnerability. The attack requires you to be using the output of a Postgres internal string escaping method and feeding that directly into psql (the CLI tool for Postgres). Or how they put it:
> Specifically, SQL injection requires the application to use the function result to construct input to psql, the PostgreSQL interactive terminal. - From the PostgreSQL CVE Announcement
So how was there a zero-day in PostgreSQL, that had just been sitting there for at least 9 years, maybe longer? And not just that, but a SQL injection vulnerability?
For the uninitiated, SQL injection is a vulnerability as old as time.
“Do not cite the Deep Magic to me, Witch! I was there when it was written!” - Aslan (King of Narnia)
SQL injection has long been a cornerstone of security 101 for developers and security researchers alike. It’s, like, the first thing, in the first chapter, of every cybersecurity textbook, ever.
Take, for example, this horrible ruby code.
# @Substack, please add syntax highlighting. I promise it wouldn't be that hard.
require 'pg'
# Connect to PostgreSQL database
conn = PG.connect(dbname: 'testdb', user: 'user', password: 'password')
# Get user input from command line
puts "Enter your username:"
username = gets.chomp
# Simple query vulnerable to SQL injection
result = conn.exec("SELECT * FROM users WHERE username = '#{username}'")
# Display the result
puts result.getvalue(0, 0) # Assuming the result has a column
Here, if the user enters admin' OR '1' = '1
as their username, the query that actually gets executed is:
SELECT * FROM users WHERE username = 'admin' OR '1' = '1'
Since 1 == 1
, this query will return all users in the database.
This is a contrived example, but we can extrapolate how this pattern could be used for must more sinister things. Like bobby tables. (You thought I was going to talk about SQL injection without bringing up bobby tables??)
A better way to write this would be:
conn.exec_params("SELECT * FROM users WHERE username = $1", [username])
This code uses the pg
gem’s exec_params
method, which handles placing the sanitized username at our placeholder, $1
.
NOTE: This improved ruby code includes what is sometimes referred to as a “prepared statement”. Prepared statements are sort of the industry standard in protecting against SQL injections.
This category of attack is so well-known that its existence has almost become a given – it’s just something that all developers know to guard against. So, it’s nothing short of wild that a SQL injection vulnerability sat undiscovered in PostgreSQL, one of the most heavily scrutinized open source projects (up there with the linux kernel), for ten years. In a system that has been used by countless developers and security experts, how could something so basic go unnoticed for so long?
Well, this SQL injection wasn’t as simple as our contrived example (or the case of bobby tables).
At the core of the entire attack was just two bytes: c0 27
.
NOTE: The bytes,
c0 27
are displayed in hexadecimal.
Let’s walk down the stack trace.
The attacker included these two bytes in a code execution path which eventually hit a method: pg_escape_string
.
pg_escape_string
is a PHP method which is used to sanitize input. Beyond Trust was using this method in a file called dbquote.php
.
So, basically, so far so good. Beyond Trust did their due diligence by properly calling a sanitization method on the user’s string input using it in a PostgreSQL query.
But, continuing to walk down the stack trace…👇
pg_escape_string
doesn’t actually do the string escaping itself. It calls a PostgreSQL function, PQescapeStringInternal
.
The
PQescapeStringInternal
method is actually part of libpq, which is a C library that provides interfaces to interact with Postgres server, and notably includes Postgres’s CLI tool, psql. You might also know libpq as “one of those libraries in your Dockerfile that you have to patch CVEs for in order to reach SOC 2 compliance”.
The PQescapeStringInternal
method exists for this exact reason—to “escape” a string, in other words, sanitize it—to make it safe to use in Postgres. In order to to this, PQescapeStringInternal
must call pg_utf_mblen
. This method is used to find the length of multi-byte Unicode chars.
Now, let’s stop here and put the pg_utf_mblen
method under a microscope.
Pretty much any text in PostgreSQL that needs escaping is going to pass through this method at one point or another.
// code as of 2025/03/13
int
pg_utf_mblen(const unsigned char *s)
{
int len;
if ((*s & 0x80) == 0)
len = 1;
else if ((*s & 0xe0) == 0xc0)
len = 2;
else if ((*s & 0xf0) == 0xe0)
len = 3;
else if ((*s & 0xf8) == 0xf0)
len = 4;
#ifdef NOT_USED
else if ((*s & 0xfc) == 0xf8)
len = 5;
else if ((*s & 0xfe) == 0xfc)
len = 6;
#endif
else
len = 1;
return len;
}
(Here’s a link to the code definition if you’re nerdy enough)
In short, we need this method because Postgres lets us use emojis. (please don’t do this)
CREATE TABLE "😀_table" (
id serial PRIMARY KEY,
description text
);
Although “visually” the emoji is only 1 character, Unicode UTF-8 encoding stores this as 4 bytes.
An ASCII character like
A
uses 1 byte.A character like
é
(which is part of the extended Unicode set) uses 2 bytes.A character like
中
(a Chinese, Unicode character) uses 3 bytes.The emoji 😀 is encoded as F0 9F 98 80 in UTF-8, which consists of 4 bytes.
So, when parsing a Unicode character (like an emoji), Postgres needs to know how many bytes it’s made of.
Ok, so now let’s take a look back at our two bytes: c0 27
.
Because our string starts with c0
, pg_utf_mblen
understands that the length of our unicode character is 2 (you can see this in the if/else blocks above).
Postgres is able to do this check because it is in accordance with UTF encoding. Basically, the first byte stores information about the total length of the UTF character.
0xc0
, a.k.a.110xxxxx
, means there are two bytes.
This length of 2 is then used in PQescapeStringInternal
.
Now here is where the bug happens.
The PQescapeStringInternal
method doesn’t actually validate that the string it is parsing with pg_utf_mblen
is valid Unicode. So, instead, it just takes the length of 2, and grabs the next byte. Which in our case is 27
.
This 27
represents none other than the SQL injector’s most highly coveted character: ' (the single quote). PQescapeStringInternal
blindly copies over this single quote, unescaped.
Now, all of this might have been fine had Beyond Trust not written a feature which allowed users to directly, programmatically interact with psql (the postgres command line interface).
The attackers were able to take this unescaped single quote and gain full control of psql. And, I didn’t know this until researching this incident, but as if an attacker getting access to you database via psql wasn’t already bad enough, it turns out that you can actually execute arbitrary system commands through psql:
\! <command>
// or...
\! ./break_all_the_things
So… yeah. That’s not good.
To wrap things up, again, only if you’re nerdy and sweaty enough, the fix happens here.
As I was investigating, I found that one YouTube user said it best:
Another good reference point on this is the Rust Book, Chapter 8.
To summarize, strings are complicated. Different programming languages make different choices about how to present this complexity to the programmer. Rust has chosen to make the correct handling of
String
data the default behavior for all Rust programs, which means programmers have to put more thought into handling UTF-8 data up front. This trade-off exposes more of the complexity of strings than is apparent in other programming languages, but it prevents you from having to handle errors involving non-ASCII characters later in your development life cycle.
After looking deep into the hacking of the US Treasury, I think it’s really clear how encoding, storage, and processing of strings can introduce unexpected behavior in any software, even PostgreSQL.
For example Paul Butler wrote about how you can actually “Smuggle arbitrary data through an emoji”. So yeah. In theory, you can encode an unlimited amount of arbitrary data in any unicode character. Think of all the possibilities!
All of this inspired me to learn more about strings and Unicode.
For example, did you know that Unicode characters are defined by the Unicode standards, which is overseen by the Unicode Consortium?
Wow what a boring sentence!
But… did you know that for the small price of $5,000, you can be the sole adopter of a Unicode character?
Now we’re talkin’.
Almost 100 people have adopted a Unicode character at the “gold” level thus far. Some of these are in memory of loved ones, or big tech companies showing the Unicode consortium some love.
Some of my favorites include:
The Oakland A’s (yes, the Major League Baseball team), spending $15,000 and adopting the baseball ⚾, elephant 🐘, and deciduous tree 🌳.
And Buffalo Wild Wings, who have adopted the poultry leg 🍗
Full list here
Anyways, this was my first blog here on the Slam Dunk Software substack. I hope you enjoyed it!
(This post was heavily inspired by PwnFunction’s YouTube video on the topic. More sources and further readings below.)