表情符号中的隐藏信息和入侵美国财政部
Hidden Messages in Emojis and Hacking the US Treasury

原始链接: https://slamdunksoftware.substack.com/p/hidden-messages-in-emojis-and-hacking

12月30日,美国财政部通知国会议员,一次由与中国相关的APT攻击者造成的数据库入侵事件,攻击者利用了Beyond Trust公司一款特权访问管理(PAM)工具中的SQL注入漏洞。该工具使用了广泛使用的PostgreSQL数据库。 漏洞存在于PostgreSQL处理字符串转义的方式。具体来说,两个字节“c0 27”的组合触发了`pg_utf_mblen`函数中的缺陷,允许将未转义的单引号(')注入查询中。 这是因为`pg_utf_mblen`将“c0 27”误解为有效的双字节Unicode字符,并盲目复制单引号,绕过了清理机制。攻击者利用这一漏洞通过psql命令行界面注入恶意SQL命令,从而实现任意系统命令执行。 根本原因是在转义过程中对Unicode字符的验证不足。这突显了即使在经过严格审查的软件中,字符串处理和Unicode编码的复杂性。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 表情符号中的隐藏信息和入侵美国财政部 (slamdunksoftware.substack.com) nickagliano 3小时前 27 分 | 隐藏 | 过去 | 收藏 | 1 评论 hnlmorg 1分钟前 [–] 作为最近处理大量 Unicode 字符串解析的人,我对这个 bug 一点也不惊讶。Unicode 系统非常优雅,但也为各种滥用敞开了大门。 回复 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:

原文

On December 30th, while most of us were preparing for a New Year’s Eve celebration, the US Treasury was prepping a notice to lawmakers to notify them that their systems, which (obviously) contain highly sensitive, confidential data, had been compromised.

(Honestly, I’m not sure how I missed this news. Usually I’m pretty plugged in, especially to, like, open-source software vulnerabilities that affect my government’s treasury department. 🤷‍♂️)

Out of compliance, the US Treasury posted this notice to US lawmakers, breaking the news that a “China state-sponsored Advanced Persistent Threat (APT) actor” had breached their systems.

And that’s not even the craziest part! Wait till I tell you how they did it!

Well, I’m not going to keep it a secret. It was good, ol’ SQL injection. (More on SQL injection in a little bit.)

These US Treasury servers were/are protected, in part, by a Privileged Access Management (PAM) tool from Beyond Trust (which, I must say, is a fantastic name for a security company). Unfortunately for them (and, well, for all of us with records at the US Treasury (😅)), it would be Beyond Trust that would have to break the news to the government, as it was their software that served as the entry point for the hackers.

But let’s not all dogpile on and throw tomatoes at Beyond Trust. This could have been any of us. Because at the core of the vulnerability was PostgreSQL–one of the most commonly used relational databases in the world.

With one caveat—it’s not exactly a blatant SQL injection vulnerability. The attack requires you to be using the output of a Postgres internal string escaping method and feeding that directly into psql (the CLI tool for Postgres). Or how they put it:

> Specifically, SQL injection requires the application to use the function result to construct input to psql, the PostgreSQL interactive terminal. - From the PostgreSQL CVE Announcement

So how was there a zero-day in PostgreSQL, that had just been sitting there for at least 9 years, maybe longer? And not just that, but a SQL injection vulnerability?

For the uninitiated, SQL injection is a vulnerability as old as time.

“Do not cite the Deep Magic to me, Witch! I was there when it was written!” - Aslan (King of Narnia)

SQL injection has long been a cornerstone of security 101 for developers and security researchers alike. It’s, like, the first thing, in the first chapter, of every cybersecurity textbook, ever.

Take, for example, this horrible ruby code.

# @Substack, please add syntax highlighting. I promise it wouldn't be that hard.

require 'pg'


# Connect to PostgreSQL database

conn = PG.connect(dbname: 'testdb', user: 'user', password: 'password')


# Get user input from command line

puts "Enter your username:"

username = gets.chomp


# Simple query vulnerable to SQL injection

result = conn.exec("SELECT * FROM users WHERE username = '#{username}'")


# Display the result

puts result.getvalue(0, 0) # Assuming the result has a column

Here, if the user enters admin' OR '1' = '1 as their username, the query that actually gets executed is:

SELECT * FROM users WHERE username = 'admin' OR '1' = '1'

Since 1 == 1, this query will return all users in the database.

This is a contrived example, but we can extrapolate how this pattern could be used for must more sinister things. Like bobby tables. (You thought I was going to talk about SQL injection without bringing up bobby tables??)

A better way to write this would be:

conn.exec_params("SELECT * FROM users WHERE username = $1", [username])

This code uses the pg gem’s exec_params method, which handles placing the sanitized username at our placeholder, $1.

NOTE: This improved ruby code includes what is sometimes referred to as a “prepared statement”. Prepared statements are sort of the industry standard in protecting against SQL injections.

This category of attack is so well-known that its existence has almost become a given – it’s just something that all developers know to guard against. So, it’s nothing short of wild that a SQL injection vulnerability sat undiscovered in PostgreSQL, one of the most heavily scrutinized open source projects (up there with the linux kernel), for ten years. In a system that has been used by countless developers and security experts, how could something so basic go unnoticed for so long?

Well, this SQL injection wasn’t as simple as our contrived example (or the case of bobby tables).

At the core of the entire attack was just two bytes: c0 27.

NOTE: The bytes, c0 27 are displayed in hexadecimal.

Let’s walk down the stack trace.

The attacker included these two bytes in a code execution path which eventually hit a method: pg_escape_string.

pg_escape_string is a PHP method which is used to sanitize input. Beyond Trust was using this method in a file called dbquote.php.

So, basically, so far so good. Beyond Trust did their due diligence by properly calling a sanitization method on the user’s string input using it in a PostgreSQL query.

But, continuing to walk down the stack trace…👇

pg_escape_string doesn’t actually do the string escaping itself. It calls a PostgreSQL function, PQescapeStringInternal.

The PQescapeStringInternal method is actually part of libpq, which is a C library that provides interfaces to interact with Postgres server, and notably includes Postgres’s CLI tool, psql. You might also know libpq as “one of those libraries in your Dockerfile that you have to patch CVEs for in order to reach SOC 2 compliance”.

The PQescapeStringInternal method exists for this exact reason—to “escape” a string, in other words, sanitize it—to make it safe to use in Postgres. In order to to this, PQescapeStringInternal must call pg_utf_mblen. This method is used to find the length of multi-byte Unicode chars.

Now, let’s stop here and put the pg_utf_mblen method under a microscope.

Pretty much any text in PostgreSQL that needs escaping is going to pass through this method at one point or another.

// code as of 2025/03/13

int
pg_utf_mblen(const unsigned char *s)
{
	int			len;

	if ((*s & 0x80) == 0)
		len = 1;
	else if ((*s & 0xe0) == 0xc0)
		len = 2;
	else if ((*s & 0xf0) == 0xe0)
		len = 3;
	else if ((*s & 0xf8) == 0xf0)
		len = 4;
#ifdef NOT_USED
	else if ((*s & 0xfc) == 0xf8)
		len = 5;
	else if ((*s & 0xfe) == 0xfc)
		len = 6;
#endif
	else
		len = 1;
	return len;
}

(Here’s a link to the code definition if you’re nerdy enough)

In short, we need this method because Postgres lets us use emojis. (please don’t do this)

CREATE TABLE "😀_table" (
  id serial PRIMARY KEY,
  description text
);

Although “visually” the emoji is only 1 character, Unicode UTF-8 encoding stores this as 4 bytes.

  • An ASCII character like A uses 1 byte.

  • A character like é (which is part of the extended Unicode set) uses 2 bytes.

  • A character like (a Chinese, Unicode character) uses 3 bytes.

  • The emoji 😀 is encoded as F0 9F 98 80 in UTF-8, which consists of 4 bytes.

So, when parsing a Unicode character (like an emoji), Postgres needs to know how many bytes it’s made of.

Ok, so now let’s take a look back at our two bytes: c0 27.

Because our string starts with c0, pg_utf_mblen understands that the length of our unicode character is 2 (you can see this in the if/else blocks above).

Postgres is able to do this check because it is in accordance with UTF encoding. Basically, the first byte stores information about the total length of the UTF character. 0xc0, a.k.a. 110xxxxx, means there are two bytes.

This length of 2 is then used in PQescapeStringInternal.

Now here is where the bug happens.

The PQescapeStringInternal method doesn’t actually validate that the string it is parsing with pg_utf_mblen is valid Unicode. So, instead, it just takes the length of 2, and grabs the next byte. Which in our case is 27.

This 27 represents none other than the SQL injector’s most highly coveted character: ' (the single quote). PQescapeStringInternal blindly copies over this single quote, unescaped.

Now, all of this might have been fine had Beyond Trust not written a feature which allowed users to directly, programmatically interact with psql (the postgres command line interface).

The attackers were able to take this unescaped single quote and gain full control of psql. And, I didn’t know this until researching this incident, but as if an attacker getting access to you database via psql wasn’t already bad enough, it turns out that you can actually execute arbitrary system commands through psql:

\! <command>

// or...

\! ./break_all_the_things

So… yeah. That’s not good.

To wrap things up, again, only if you’re nerdy and sweaty enough, the fix happens here.

As I was investigating, I found that one YouTube user said it best:

Another good reference point on this is the Rust Book, Chapter 8.

To summarize, strings are complicated. Different programming languages make different choices about how to present this complexity to the programmer. Rust has chosen to make the correct handling of String data the default behavior for all Rust programs, which means programmers have to put more thought into handling UTF-8 data up front. This trade-off exposes more of the complexity of strings than is apparent in other programming languages, but it prevents you from having to handle errors involving non-ASCII characters later in your development life cycle.

After looking deep into the hacking of the US Treasury, I think it’s really clear how encoding, storage, and processing of strings can introduce unexpected behavior in any software, even PostgreSQL.

For example Paul Butler wrote about how you can actually “Smuggle arbitrary data through an emoji”. So yeah. In theory, you can encode an unlimited amount of arbitrary data in any unicode character. Think of all the possibilities!

All of this inspired me to learn more about strings and Unicode.

For example, did you know that Unicode characters are defined by the Unicode standards, which is overseen by the Unicode Consortium?

Wow what a boring sentence!

But… did you know that for the small price of $5,000, you can be the sole adopter of a Unicode character?

Now we’re talkin’.

Almost 100 people have adopted a Unicode character at the “gold” level thus far. Some of these are in memory of loved ones, or big tech companies showing the Unicode consortium some love.

Some of my favorites include:

  • The Oakland A’s (yes, the Major League Baseball team), spending $15,000 and adopting the baseball ⚾, elephant 🐘, and ​​deciduous tree 🌳.

  • And Buffalo Wild Wings, who have adopted the poultry leg ⁨🍗⁩

  • Full list here

Anyways, this was my first blog here on the Slam Dunk Software substack. I hope you enjoyed it!

(This post was heavily inspired by PwnFunction’s YouTube video on the topic. More sources and further readings below.)

联系我们 contact @ memedata.com