(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39891607

XZ 数据压缩实用程序包含因使用 Glibc 的 IFunc 而导致的后门。 该缺陷允许未经授权的访问,引起了 Linux 发行版的广泛关注。 至少三个流行的发行版——arch、gentoo 和 opensuse tumbleweed——在数周内发布了后门。 影响是巨大的,因为后门 ssh 可能会影响大量服务器。 尽管问题很严重,但包含存储库中不同代码的源代码分发 tarball 仍然令人担忧。 应提交自动生成的工件并保持更新自动生成的文件,以最大限度地降低风险。 此外,避免使用自动工具等工具并减少依赖性有助于增强安全性。 处理大量补丁的发行版维护人员构成了挑战,导致潜在的分叉库和无资金的维护。 解决开发商的财务问题对于确保质量和可靠性至关重要。 地缘政治因素会影响开发商的可信度,因此需要谨慎和勤勉的检查。

相关文章

原文


What I haven't seen discussed much is the linking mechanism that allowed the lib to hook into RSA_public_decrypt. Plenty of talk about what could or could not be achieved by even more process separation and the like, but little about that function call redirect. Could it be possible to establish a way to link critical components like the code for incoming ssh with libraries in some tiered trust way? "I trust you when and where I call you, but I won't allow you to introduce yourself to other call sites"?

This would surely fall into the category of "there would be ways around it, so why bother?" that triggers a "by obscurity" reflex in many, but I'd consider it reduced attack surface.



An Erlang actor like model where a program’s sub components call each other by message passing with each having its own security context may work.

However, there are multiple security contexts at play in an operating system; with regards to the XZ backdoor it’s mostly about capability based security at the module level, but you also have capabilities at the program level, and isolation at the memory level (paging), isolation at the micro architectural level, and so on. Ensuring all of these elements work together while still delivering performance seems to be rather challenging, and it’s definitely not possible for the Unix-likes to make a move to such a model because of the replacement of the concept of processes.



It means the libraries are only loaded when they are needed, so if you never use the (e.g.) xz compression feature, the xz library will not be loaded, and a backdoor added in the xz library simply can't trigger.

(Another side note is this may change the initialization order of libraries--so the initialization functions of an xz library don't run until xz is first used, and this may fail to let you intercept the ssh routines in time.)



From what I can tell the problem is the use of glibc's IFUNC. This broke the testing for XZ, suddenly accounts appeared to lobby for disabling that testing, which enabled the exploit.


IFUNC is arguably not the real issue. IFUNC was used to create a function that will be called on library load (since you need a resolver function to decide which function to map in). There are other ways to create "callback on library load" as well. I think ifunc was actually used for obfuscation instead. Jia Tan created a somewhat plausible scenario of using ifunc's to select which CRC function at load time to use for performance reasons (whether that actually increases performance is arguable but at least plausible). The malicious version swapped the resolver function to be a malicious one but either way it's just a way to create a function that can be called on library init.

The actual hook for intercepting the call was done via audit hooks.

So I guess it's really two things working together.



    The goal is to use a standardized test framework to ease writing of tests in XZ. 
    Much of the functionality remains untested, so it will be helpful for long term project stability to have more tests
    
    -- Jia, 2022-06-17
This was a long time in the making.


Why do they say "almost" infected the world? At least 3 quite popular Linux distributions (arch, gentoo, and opensuse tumbleweed) ended up shipping the backdoor _for weeks_ , and it was most definitely working in at least tumbleweed. For weeks! A backdoored ssh! Hardly "almost".


Arch and Gentoo are fairly popular as hobbyist distributions but they’re far less common in professional use, especially for the servers running SSH which this attack targeted. That doesn’t mean what happened is in any way okay but if this hadn’t been noticed long enough to make it into RHEL or Debian/Ubuntu stable you would be hearing about it in notifications from your bank, healthcare providers, etc. A pre-auth RCE would mean anyone who doesn’t have a tightly-restricted network and robust flow logging would struggle to say that they hadn’t been affected.


Arch and Gentoo were also not supported, although the code shipped, because the exploit explicitly checked for RPM- and deb-packaged distros.

Suse is RPM based, but don’t remember whether the check was for the utilities or another method — Suse uses zypper for package management, as opposed to yum/dnf on the far more popular RedHat-based distros, so it depends how the exploit checked.



It might be riskier (because you'd have to identify yourself with government documents) to plant a backdoor in a similar way at a large, proprietary software vendor like Microsoft. But I don't know that it would be harder. And in the case of proprietary software there would not be nearly as much public scrutiny, and the scrutinizing public would have fewer resources for inspecting it.


Either way, it gives more jobs and $$$ to software developers in general. I'm fine with both :)

Just imagine how many more jobs will be created if every large company decides to roll their own stuffs. A lot are actually doing this, but not enough.



If that is the takeaway the industry takes from this, it will be a huge mistake. We are talking about this at all precisely because it was open source. Commercial closed source software can simply be assumed to be compromised. We know of enough instances of it happening that if you still have a knee-jerk "oh that sounds like a conspiracy theory" reaction to that claim, you need to recalibrate your conspiracy theory meter, quickly.


This might solve the original author's issues, AND might also attract other people to do the job. The more people, the more eyes. It's definitely not a silver bullet, but I would be surprised that OSS maintainers are fine with the current financial arrangement, or lack of it.


Money itself doesn't necessarily cure mental health issues. I'm mean it usually doesn't hurt, but it's not like you can blend cash up into a smoothy and cure depression. (Yes, that's a South Park reference.)


I bet $101 that we find something similar in the wild in the next 12 months as the maintainers start to look at each other's past commits with suspicion.


I wonder if we'll find the cases that were done and used, because if I had something like this and it worked, afterwards I'd "find it" with another account and get it fixed ...


My personal takeaways from this:

1. Source distribution tarballs that contain code different from what's in the source repository are bad, we should move away from them. The other big supply chan attack (event-stream) also took advantage of something similar.

1a. As a consequence of (1) autogenerated artifacts should always be committed.

2. Autogenerated artifacts that everyone pagedowns over during code reviews is a problem. If you have this type of stuff in your repository also have an automatic test that checks that nobody tampered with it (it will also keep you from having stale autogenerated files in your repository).

3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.

4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.

5. In general there's a culture that code reuse is always good, that depending on large libraries for small amounts of functionality is good. This is not true, dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in.

6. Distro maintainers applying substantial patches to packages is a problem, it creates widely used de facto forks for libraries and applications that do not have real maintainers looking at them.

7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.

8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.



> 8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.

That won't help. There's no evidence that Jia Tan is a real name, or even a real person for that matter. If projects stop accepting contributions from asian-sounding names, the next attack will just use Richard Jones as a name.



I think you could interpret this as "you need to know, personally, the party you're handing this off to, and make reasonable judgments as to whether or not they could be easily compromised by bad actors".

Like, meeting someone at several dev conferences should be a requirement at the very least.



> Like, meeting someone at several dev conferences should be a requirement at the very least.

This is utterly and completely unfeasible. Most open source maintainers, especially those that are struggling and are pressured to hand-off maintenance, don't have the time, means and will to travel to meet up with prospective co-maintainers, not just once but multiple times.

In practice it would just result in projects getting abandoned, the prospective co-maintainer starting a fork, and everyone switching to use the fork.



Intelligence agencies mastered fooling people about their bona fides in person a long time ago. Meeting someone in person will stop casual individuals who just want to crash the project for the lols or some other personal-level reason, but it would have been barely a bump in the road for this clearly-nation-state-level attack.


It adds another layer of complexity, though, and when someone trips up (and eventually, they will), it lets us know who is doing what and why. It also adds another layer of expense and vulnerability. Part of the beauty of cyberattacks for intelligence agencies is that they are very light on tradecraft. This helps to reduce that advantage.

We actually know who poisoned Alexander Litvinenko and what they're up to today, for example.[0]

[0]https://www.bbc.com/news/uk-35370621



You could probably get about 80% of the job done with just 20% of the work. I’m not in OSS but I hire technical people remotely, and I know many people that do. Some consultant friends have caught blatant scams within the first couple of rounds of interviews on Zoom.


>I’m not in OSS but I hire technical people remotely, [...] interviews on Zoom.

Different situations with different incentives and psychology:

- potential to receive money : job candidates are willing to get on Zoom calls or meet in person because they want a paycheck.

- no money involved & volunteer for free : potential open source contributors are not interested in getting on Zoom calls for $0 pay.



Also, while I'm not a lawyer, and in general the bar is very high to criminally prosecute an employee for doing "a bad job," I wouldn't be surprised if there are major jurisdictions where intentionally backdooring your employer's code could land you in prison.


Interviewing is a different situation though, because you start having essentially no relationship with the interviewee, and you haven't seen their work. OSS projects don't just add everyone that asks as a co-maintainer. Usually it's someone that has contributed to the project for a while already, and through that has shown they understand the code, are capable of improving it, and can make sensible decisions about the road to take.


Yea, I get that, from what I see in the details of this case the contributor was very competent. What I'm questioning here is whether in OSS projects there is enough face-to-face communication, probing about values, etc.


I don't think it's crazy for a maintainer to Google the person a bit, and if there is no positive match, ask the other person for at least a little bit of detail about themselves, like where they live (country/city), who they work for, etc. Maybe hop on a phone call or something.

In this case, Jia Tan just doesn't seem to match any real person we can find online. It's not like there's an elaborate online persona that they have really built.

While I don't want to put Lasse Collin on trial since he's a victim too, I do think he owes the community an update and explanation of what went down. It's not because we want to point fingers at him, but to learn from the experience.



Really, it starts before things get bad. This thing where - in the famous XKCD example - a single guy is thanklessly maintaining a project for 20 years in Nebraska needs to stop. Software libraries like these are no longer a one-person job. They can't be for the bus factor alone. Major projects like Linux distros or bigger foundations like Apache or Mozilla need to start harping on people hard to contribute to important libraries. We'll get to whatever the buzzword of the day is once we do the important work first.

Find a way to make it happen. Sorry, "I just can't" isn't going to cut it after this.



Is the argument that well known software should be taken over by professionals? There are many motivated software maintainers, including single guys in Nebraska, who have better operational security than well funded companies.

Remember the recent incident with the signing keys at Microsoft? Or the one before that? And these are the biggest, most well funded, companies on Earth we are talking about.

Organizations such as Let's Encrypt work well because they are staffed with motivated and competent people, not because they are well funded. This is not a problem that can be solved with funding alone.



I agree we need to stop depending on the 20-year hobby project of the guy in Nebraska, but adding barriers (which requiring travel and in-person meetings is) to sharing the load is not the solution. What these projects need is the necessary resources (mostly money) for multiple people to work on it professionally.


What I don't understand is, where are all the code and security contributions from Big N and other multi-billion-dollar international scale and users? Do they all have their own internal fork of every major library? If not, you would think that they would be their own financial interest to keep somebody on payroll to maintain fundamental libraries like this.


I think this is a really important point. Every commercial contract I've been involved in has clauses that are intended to mitigate supplier risk, eg that they go out of business, and the contracts people do due diligence on suppliers to vet that they are who they say, try to eliminate the one-person operations, and generally mitigate risk if they really need the code but the only supplier is a tiny startup.

Perhaps large corpos need to apply their standard risk mitigation lens to their supply chain. Their stack or their security depends on these 390 packages. 27 of them have less than 3 maintainers. Recommendation: find alternatives.



> bigger foundations like Apache or Mozilla

What bigger foundations? Apache foundation has yearly revenue $2.1 million. Why do you think they reacted as they reacted to log4j? There are no resources.

Open source is running on fumes.



pretty much this.

That's why for whatever anyone thinks of Theo's antics, I appreciated the OpenSSL/LibreSSL Valhalla blogs and overall effort to do something about it.

TBH I'm amazed in it's current state that Apache took in Pekko(FKA JVM Akka...), part of me is guessing it's because some of their other infra is dependent on it...

Foundation based OSS is on fumes. Open core... I am still hopeful for on many levels.



> 2) Simply meeting IRL is a terrible proxy for credibility.

Disagree; trust your intuition, but you can never do that if you never meet IRL.

Also, it's not racist or xenophobic to recognize that some countries exercise nearly complete control over their citizens (and sometimes indirectly over non-citizens), and that those people could be putting themselves at extreme personal risk by disobeying those dictates (assuming they did disagree, which doesn't seem to be a given)



> WRT 2, no, it's not.

> Trust your intuition, but you can never do that if you never meet IRL.

I'm sure Edward Snowden also met up with colleagues in the office at least a few times. May have even passed a security clearance.

> Also, it's not racist or xenophobic to recognize that some countries exercise nearly complete control over their citizens (and sometimes indirectly over non-citizens), and that those people could be putting themselves at extreme personal risk by disobeying those dictates, if they even disagreed with them.

Hold up, where did I make this claim about national origin/external pressure?

I'm only suggesting if you have pets, a kid, or a project at work, conferences take a non-zero amount of time to plan to attend.

Plus, what conference options even exist if you're finding other people for the xz library? Searching for #CompressionConf2024 isn't turning up much.



> I'm sure Edward Snowden also met up with colleagues in the office at least a few times. May have even passed a security clearance.

And that's why we know who Edward Snowden is. That's more than we can say about Jia Tan.

Say what you will about what he did and why, it is going to be very, very hard for someone to explain to a contract's security auditor why, in the year 2024, a commit from an account known to belong to Edward Snowden is in the source code of security-critical software.

And that's what FOSS-based companies and orgs need to start doing after this. If I'm working for Debian/Mozilla/Apache/wherever, I'm going to start asking project maintainers more about who they are. "Hey man, we've got an all-expenses-paid trip to one of the major conferences this year, which one can we put you down for?" needs to come out of someone's mouth at some point, and excluding some very good reasons and evidence for why they can't appear at one of these events in-person (think health or long-term family obligation reasons, confirmed by multiple people who know the maintainer), they need to be at one or more meetings within a reasonable amount of time. Randomly-timed remote video meetings could work in a pinch.

If they can't after a couple of years, then these projects need to inform the maintainers that they'll be forking the project and putting it under a maintainer who can be verified as a living, breathing, single person.

Repeat until there's at least some idea of who's working on most of these projects that make up critical systems that society is built upon.



Let's use the current theory that this is a state sponsored attack. If that's the case, another Jia Tan will be recruited. The identity of a single person simply doesn't matter. All that matters is that the attack was attempted.

Consider the issue of candidates who lie in the interviewing process by hiring other people to interview on their behalf. Now replace "interview" with "attend conference". This is just adding another vector of blind trust waiting to be abused.



It raises the bar quite a bit.

Especially when you've met Jia Tan and the new Jia Tan is obviously not the same person.

Meeting in person is quite literally the opposite of blind trust. Blind trust would be assuming that the person physically sitting on the other end of the internet connection and controlling Jia Tan's keys is the same Jia Tan you had lunch with a few months ago.



You cannot prove the person behind the keyboard is the person who is meeting up with people.

This is blind trust because of the assumption that the person is the same.



> Also, it's not racist or xenophobic to recognize that some countries exercise nearly complete control over their citizens (and sometimes indirectly over non-citizens), and that those people could be putting themselves at extreme personal risk by disobeying those dictates (assuming they did disagree, which doesn't seem to be a given)

This is true even when they are no longer in that country. Some governments are known to threaten the family of expatriates. "Do this for us or mom and dad are going to spend the rest of their soon to be short lives doing hard labor" is a pretty tough threat to ignore.



It's naive to believe that any form of physical presence means someone isn't going to do something nefarious in the eyes of the project.

This problem can only be solved by more skilled eyes on the projects that we rely on. How do we get there? shrug.gif.

Anything less is trying to find a cheap and ineffective shortcut in this trust model.



> It's naive to believe that any form of physical presence means someone isn't going to do something nefarious in the eyes of the project.

It's not the only thing, but it is something.

There's a lot of social engineering that went into the xz backdoor[0]. This started years ago; Jia Tan was posting in projects and suddenly someone appeared to pressure projects to accept their code. Who's Jia Tan? Who's Jigar Kumar, the person who is pressuring others to accept patches from Jia Tan? We don't know. Probably some person or group sponsored by a state APT, but we don't know for sure, because they're currently just text on a screen.

Having this person or group of people have to continually commit to the bit of publicly-known open-source maintainer who attends conferences, has an actual face, and is on security camera footage at multiple hotels and airports is far, far harder than just talking a vulnerable person into allowing maintainer access on a repository. Making them show up to different places a few times adds a layer of identity. Otherwise these "skilled eyes" could be anyone with a wide variety of motivations.

[0]https://boehs.org/node/everything-i-know-about-the-xz-backdo...



> Having this person or group of people have to continually commit to the bit of publicly-known open-source maintainer who attends conferences,

This is assuming maintainers even care/want to go.

> has an actual face, and is on security camera footage at multiple hotels and airports

The same footage that'll get wiped a few weeks after the conference ends, and quickly becomes not useful.

This is wonderful posturing in the name of security theater but doesn't solve anything.



> This is assuming maintainers even care/want to go.

If they don't want to go, don't use their project. Sorry, these aren't the TI-83 games you passed around at your high school with programming cables; they're the code libraries our society is built on. If my project relies on your project, I need to know who you are. If I can't figure that out, I'll try to find another one.

> The same footage that'll get wiped a few weeks after the conference ends, and quickly becomes not useful.

This is wonderful posturing in the name of security theater but doesn't solve anything.

Along with receipts, eyewitnesses, plane tickets, etc. that put a person at a place at a time. Doesn't all have to be digital evidence.



You have a good point, but there's also a reason why companies like people to come into work and don't hire remotely as much as they should (or could). There's a reason why interviews often include a meal together. Meeting people IRL is good for building trust, on both sides.


That's not what the above commenter said. This may be your interpretation but the above commenter is essentially saying "don't work with Chinese-sounding developers" and is the completely wrong take here. Jia Tan may or may not be Chinese but the core issue is the lack of basic vetting to make sure he/she/they are a real person.


I feel a census coming on.

There needs to be a reckoning of who is doing what where on this sort of thing. After this whole fiasco you'll probably see more contracts wanting to know who's working on these things, and that will, in turn, have people auditing their software's packages.



Big vendors should pay to get to know them, because they're the ones making the money off of the developers' work, but "I don't want to meet anybody and want to just manage the project" is the FOSS version of "just trust me bro".


I don't think that #8 implies that projects should stop accepting contributions from Asian-sounding names. To me it means that people should be more careful who they give access. It doesn't matter if it was China or some other state or organization pretended to be China, the problem is that people don't expect that open source contributor wouldn't act in altruistic way, but can be a malicious entity.


And to build on your point (hopefully), one way of understanding #8 is that it's not out of the question that bad actors have the time resource and patience to coordinate long-term campaigns of significant subtlety, the type of which is more easily pulled off by a state actor. Facts such as those should inform our presumptions about when and where people enjoy the benefit of the doubt.


For example, we hope that Linus is not a long-term agent of the Suojelupoliisi - but how would you prove it?

Ideally, the "proof is in the code" and the review setup is strong enough that it could handle a Compromised Linus™, even if it couldn't handle multiple compromises.



I mean I would hope that there's a way to separate out the Linuses from the Jia Tans. But it's no longer out of the question that a campaign can build up an account or accounts with long-term histories of good standing that really challenge our intuitions.

But I suppose you are right, the best backstop is for the proof to be in the code.



> 4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.

I couldn't agree more. Coming from the BSD world, systemd is a shock to the system; it's monstrous and has tendrils everywhere.



I read recently that systemd does not recommend that you link to libsystemd to participate in systemd-notify message passing. The API for talking to it is quite simple, and vendors are encouraged to implement a compliant interface rather than loading all of libsystemd into your program to manage this. This of course would mean maintaining your own compliant interface as API changes happen, which is likely why it isn't done more frequently. It seems to me that there would be a lot of value in systemd stubbing out libraries for the various functions so that dependent projects could link to the specific parts of systemd it needs. That, or some other way to configure what code gets load when linking libsystemd. Full disclosure, I've not looked at libsystemd to see if this is already possible or if there are other recommendations by the project.


If you actually come from BSD, you'd hopefully recognize a set of different utilities combined to form a holistic system released under a single name. It's not a new idea.

Besides, the gpp is incorrect: systemd dependencies are not needed for initialisation notifications.



I think there's a low-effort solution to GP: Just split off the notification function for now.

There's a dilemma here: Make a huge number of tiny libraries and people complain about left-pad. Make a monolith and this type of attack can happen. If left-pad is more preventable, let's go that way. The fact that C and C++ have tons of overhead in producing a package is their problem to deal with through better tooling.



> Make a huge number of tiny libraries and people complain about left-pad.

Making a number of similar libraries that would be better served as some sort of common set (i.e. even at the most basic level, right pad and left pad can be in one thing, RIGHT?)... but at the same time it's a particularly bad example because the overall behavior of that tread was a form of influencer growth hacking.

that said, I think something like a 'notification function' falls into the category of 'boundary API' and those should always be segregated where possible for security as well as maintenance purposes for all parties.



> I think there's a low-effort solution to GP: Just split off the notification function for now.

100% agree that some of the functionality could be decoupled, and either the project should provide independent helper libs or at least do a better job of documenting the interfaces.

In this specific case, the notification interface is documented (and there's client implementations in a bunch of languages).



More personal observations:

8. Consumers are naive, yes. But the software industry itself is naive about the security threat.

9. The social exploit is part of the code exploit.

10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated. FOSS needs material support from industry. A MSFT engineer caught this exploit, but it still was released to G.A. in Fedora 41, openSUSE, and Kali.

11. The dev toolchain and testing process were never conceived to test for security. (edit: Also see Solarwinds [1] )

= = =

[1] _ https://www.wired.com/story/the-untold-story-of-solarwinds-t...



> 10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated.

One thing that could help with this is if somebody points an LLM at all these foundational repositories, prompted with "does this code change introduce any security issues?".



Re: 1. People keep saying this. We should stop distributing tarballs. It's an argument that completely ignores why we have release artifacts in the first place. A release artifact contains more than just autoconf scripts.

There can be many reasons to include binary blobs in a release archive. Game resources, firmware images, test cases. There was today a comment that mpv includes parts of media files generated with proprietary encoders as test cases. That's good, not bad.

The well maintained library sqlite is everywhere, and has an excellent test suite. They release not one but two tarballs with every release, for different stages of compilation. It would be trivial to stop doing this, but it would make maintaining packages more work, which does nothing to improve security.

The reason Debian builds from curated tarballs are because they are curated by a human, and signed with a well known key. They could certainly build from git instead. But would that improve the situation? Not all projects sign their release tags. And for those that do, it is more likely to be automated. We the collective want changes to be vetted first by the upstream maintainer, then by the package maintainer, and would prefer these entities to be unrelated.

This time the process was successfully attacked by a corrupt upstream maintainer, but that does not mean we should do away with upstream maintainers. Several backdoor attempts have been stopped over the years by this arrangement and that process is not something we should throw away without careful consideration.

The same improvements we have been talking about for years must continue: We should strive for more reproducible builds. We should strive for lower attack surface and decrease build complexity when possible. We should trust our maintainers, but verify their work.



> 1a. As a consequence of (1) autogenerated artifacts should always be committed.

Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?

> 4. Libsystemd is a problem for the ecosystem.

libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?



> Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?

I'd say anything that is input to the compiler should be.

> libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?

Yes, the libnss stuff is also a problem.



> libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?

IMO yes. I definitely believe having basic common functionality (malloc, printf, memcpy etc.) provided by one library with all the crazy/obscure stuff that very few people need or want somewhere else would be an improvement.



> Is libc a problem for the ecosystem?

Absolutely yes. And also the size of the kernel.

Those two currently have a much better guaranteed quality than systemd, thus systemd is a much more pressing issue. But they don't stop being a problem just because they are not the largest one.



> the size of the kernel

Is mostly in the hardware support, only a tiny fraction of which is actually active. Linux has a lot of drivers, many of them are crap, but it's not obvious to me that Linux would be better off with no driver than a crap driver.



That tiny fraction is quite huge. Filesystems and networking support are well known problematic areas, the sound system is a chapter by itself, and Linux is full of old, should-be-unused interfaces that attackers successfully use once in a while.

Besides, the core part of the kernel is way too big for anybody to read. And any of it can interact with any other part.



I don’t understand how this is still the best way to test if features are available in C. Can’t the OS / environment provide a “features_available” JSON blob listing all the features on the host system? Is AVX2 available on the cpu? OpenSSL? (And if so, where?) and what about kernel features like io_uring?

Doing haphazard feature detection by test compiling random hand written C programs in a giant sometimes autogenerated configure script is an icon of everything wrong with Unix.

This hack shows that the haphazard mess of configure isn’t just ugly. It’s also a pathway for malicious people to sneak backdoors into our projects and our computers. It’s time to move on.



>> Can’t the OS / environment provide a “features_available” JSON blob listing all the features on the host system? Is AVX2 available on the cpu? OpenSSL? (And if so, where?) and what about kernel features like io_uring?

There are some utilities like pkg-config [1] and /proc/cpuinfo [2] that try to provide useful configuration information in distribution agnostic ways.

[1] https://en.wikipedia.org/wiki/Pkg-config

[2] https://www.baeldung.com/linux/proc-cpuinfo-flags

>> Doing haphazard feature detection by test compiling random hand written C programs in a giant sometimes autogenerated configure script is an icon of everything wrong with Unix.

True, but it works quite well which is why it is widely used. If you need to ensure that your C code will implement a desired feature, testing it with a small program before building makes a lot of sense. With different operating systems running various C compilers that all work slightly differently, it is a proven approach that achieves the needed outcome, however ugly it might be.



A similar thing could easily be done with other approaches: a typo of a feature name would have the same effect. The main issue is autoconf is such a mess of layered bad scripting languages it's impossible to really get a good idea of what is actually going on.

In general I think feature detection is probably not necessary for most cases (especially the cases that a huge number of autoconf scripts do: the number of linux-only projects which have feature detection for a feature which is certainly present is ridiculous). It's much more reasonable to just try to build with all features, and provide manual flags to disable unwanted or unavailable ones. These at least mean the user/maintainer can decide more explicitly if that feature should be present or not.



A few things here.

First, yes, there's several tools which provide (incomplete) feature selection functionality, you can see some sibling comments for examples.

Second, especially in complex projects, the presence of a feature doesn't necessarily mean it's sufficiently complete to be workable. You can run into issues like, say, "I need io_uring, but I need an io_uring op added in version X.Y and so it's not sufficient to say 'do I support io_uring.'" Or you can run into issues like "this feature exists, but it doesn't work in all cases, particularly the ones I want to use it for."

Third, there's no real alternative to feature detection. In practice, build systems need to cope with systems that pretend to be other systems via incompletely-implemented compatibility layers. Version detection ends up creating the User-Agent problem, where every web browser pretends to be somebody pretending to be somebody pretending to be Netscape 5.x and if you try to fix this, the web breaks. (Not to mention the difficulty of sniffing versions correctly; famously, MS skipped Windows 9 reportedly because too many build systems interpreted that to mean Windows 95 or Windows 98 with catastrophic results).

The end result of all of this is that the most robust and reliable way to do feature detection is to try to use the feature and see if it works.



> In practice, build systems need to cope with systems that pretend to be other systems via incompletely-implemented compatibility layers.

That sounds fine though. If the system claims to provide feature X, you probably want the program in question to compile assuming feature X is available. If the compatibility layer doesn’t work as advertised, a compiler error is a great choice. Let the user choose to turn off that flag in their system configuration when building the project.

I’m not proposing user agent sniffing. I’m proposing something much more fine grained than that. Make something that looks more like the output of configure that build systems can use as input.



JSON is too far, says the engineering culture still relying on a pile of shell scripts like it’s 1970.

(that’s unfair, there’s probably tooling to build the shell scripts automatically I bet)



... The engineering culture which gave us 200kb fragile, semi-autogenerated configure scripts checked in to our repositories. Configure scripts which - as we've just seen - are a great place to hide malicious code.

I can't take this criticism seriously. 200kb of configure script = good, 1000 lines of JSON parser in bash = bad? What?



I think that would land harder if configure / automake / autoconf were actually a standard. And not, you know, a bunch of cobbled together shell scripts that generate other shell scripts.


I'd add a 9: performance differences can indicate code differences. Without the 0.5s startup delay being noticed the backdoor wouldn't have been found. It would be much easier to backdoor low-performance software that takes several seconds to start than something that starts nearly instantly.


> Or… just have downstream users run autotools as part of the build?

See point 3:

> 3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.



Whether it's autotools or not is not very relevant to point 1a. I'm also confused why point 1 leads to 1a, and not to the opposite of 1a.

Source distribution tarballs should not contain code different from what's in the source repository. They should not contain automatically generated artifacts, since those should not be in the repository, since they are by definition not the source, but output of some kind of build process.

Having the automatically generated configure script in the repository would have made it slightly easier to spot the backdoor if anyone took the time to read the committed configure script, but if it's already in the repository most people will just take that for granted, not run whatever process generates it, and not notice that it's not actually the output of said process.



I think the point is that all of the code which will get compiled to produce the final binary should be in the repo, and so any generated code that affects the final binary should be in the repo.

The use of autotools or other similar tools, ones that are supposed to generate code on the fly on the final user's machine, make this requirement essentially impossible.



That very last point seems like something that's fairly amenable to automation, though. (Then, that automation can be attacked, but that seems like one more [fairly independent] layer that must be bypassed to execute one of these attacks.)


Part of the bad in autotools culture is to run it when creating the release so people don't need to run before building.

Letting people run autotools would completely avoid this one hack.

But well, you have a point in that most of what makes autotools bad is that you can't expect your userbase to learn how to use it.



9. We should move toward formal verification for the trusted core of systems (compilers, kernel, drivers, networking, systemd/rc, and access control).

With regard to 1, there are some other practical steps to take. Use deterministic builds and isolate the compilation and linking steps from testing. Every build should emit the hashes of the artifacts it produces and the build system should durably sign them along with the checksum of the git commit it was built from. If there need to be more transformations of the artifacts (packaging, etc.) it should happen as a separate deterministic build. Tests should run on a different machine than the one producing the signed build artifacts. Dropping privileges with SECCOMP for tests might be enough but it's also unlikely to be practical for existing tests that expect a normal environment.



> 1a. As a consequence of (1) autogenerated artifacts should always be committed.

I philosophically and fundamentally hate this suggestion, but have to agree with it. It's going to make porting harder, but is sadly a cost worth paying.

> dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in

Tough call. A major library is more likely to be bug fixed and tuned than something you write (which is a good reason to use them which is what makes them attractive as an attack vector). Getting this right requires taste and experience. The comment says "depending on large libraries for small amounts of functionality [is bad but thought to be good]". What constitutes "small amount" vs large requires experience. Certainly cases of this tip my bias towards re-implement vs re-use.



I would more just say autogenerated artifacts should just be autogenerated by the build. Committing them doesn't really solve the problem. This is pretty much just a historical hangover in autotools where it targeted building on platforms where autotools wasn't installed, but it's not really a particularly relevant use-case anymore. (I do agree in general that autotools is bad. Especially on projects where a simple makefile is almost always sufficient and much more debuggable if autotools fails).

I don't think libsystemd is a particular problem. Or at least it being linked in only made the job of writing the exploit slightly easier: there's enough services running as root that will pull in a dependency like this that the compromise still exists, it just requires a few more hoops to jump through. And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency (if not for security, then simply for ease of building in a way which supports systemd notification but doesn't need anything else).

Dependencies are a liability, for sure, but I think a lot of the reaction there is not entirely helpful. At least, the size of the dependency tree in a package manager is only about as good a proxy for the risk as number of lines of code is for software project progress. Dependencies need to be considered, but not just minimised out of hand. There are plenty of risks on the reimplement-it-yourself side. The main thing to consider is how many people and who you are depending on, and who's keeping an eye on them. The latter part is something which is really lacking: the most obvious thing about these OSS vulnerabilites is that basically no-one is really auditing code at all, and if people are, they are not sharing the results. It should in principle be possible to apply the advantages of open-source to that as well, but it's real hard to set up the incentives to do it (anyone starting needs to do a lot to make it worthwhile).



> I would more just say autogenerated artifacts should just be autogenerated by the build.

There are practical and philosophical problems with this. From the practical point of view you generally want to make contributing (or even just building) your stuff as low friction as possible and having extra manual build steps (install tools X at version X1.X2.X3, Y at version Y1.Y2 and Z at version Z1.Z2rc2) isn't low friction.

Philosophically, you are just shifting the attack vector around, you now need to compromise one of tools X, Y and Z, which are probably less under your control than the artifacts they produce.

> And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency

People say this but I'm skeptical, this is the actual documentation of the protocol:

"These functions send a single datagram with the state string as payload to the socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "/" or "@", the string is understood as an AF_UNIX or Linux abstract namespace socket (respectively), and in both cases the datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS. If the string starts with "vsock:" then the string is understood as an AF_VSOCK address, which is useful for hypervisors/VMMs or other processes on the host to receive a notification when a virtual machine has finished booting. Note that in case the hypervisor does not support SOCK_DGRAM over AF_VSOCK, SOCK_SEQPACKET will be used instead. The address should be in the form: "vsock:CID:PORT". Note that unlike other uses of vsock, the CID is mandatory and cannot be "VMADDR_CID_ANY". Note that PID1 will send the VSOCK packets from a privileged port (i.e.: lower than 1024), as an attempt to address concerns that unprivileged processes in the guest might try to send malicious notifications to the host, driving it to make destructive decisions based on them."

So technically you have to support unix domain sockets, abstract namespace sockets, whatever SCM_CREDENTIALS is, whatever AF_VSOCK is and the SOCK_SEQPACKET note is completely obscure to me.



No system is safe from bad actors.

The only way to armor yourself is to have consistent policies. Would this have happened if there were code reviews and testing?

Consistency is key. At my workplace we routinely bypass branch protections, but we're only responsible for a few customers.



Well, so no news?

But seriously, yes, I think I've seen people dismissing each one of those points. And now we have concrete proof they are real. The fact that somehow an scandal like this didn't happen before due to #1, 2, or 3 is almost incredible... on the meaning that a viable explanation is that somebody is suppressing knowledge somewhere.

Point 8 simply isn't going to happen. And that means that if you want secure OSS, you must pay somebody to look around and verify those things. And the problem with that is this means you are now into the software vendor political dump - anybody that gets big doing that is instantaneously untrustworthy.

Overall, my point is that we need some actual democratic governance on software. Because it's political by nature, and pushing for anarchy works just as well as with any other political body.



This attack used the fact that several distros patch OpenSSH to link to libsystemd for notifications. Libsystemd links liblzma, and the backdoor checks if it's been linked into OpenSSH's sshd process to run. Without distro maintainers linking libsystemd, xz wouldn't have been a useful target for attacking OpenSSH.


There are a lot of bad to terrible takes here, ranging from hindsight 20/20 to borderline discriminatory:

3. The issue here has more to do with the generated tarball doesn't match source. You (i.e. distro owners) should be able to generate the tarball locally and compare with the generated artifact and compare. Autotools is just a scapegoat.

4. xz is used in a lot of places. Reducing dependencies is good, but trying to somehow say this is all systemd's fault, for depending on liblzma is not understanding the core issue here. The attacker could have found another dependency to social engineer into, or find a way to add dependencies and whatnot. It's very easy to say all these stuff in hindsight.

5. Again, I agree with you on principle that dependencies and complexity is a big issue and I always roll my eyes when people bring in 100's of dependencies, but xz is a pretty reputable project. I really really doubt someone would have raised an issue with adding liblzma or think that the build script would introduce a vulnerability like that. Again, a lot of hindsight talking here, instead of actually looking forward to how something like this could realistically be prevented. Too many dependencies are but it's not suddenly everyone will write their own compression libs.

6. Again, I mean, I don't disagree with you on principle but that is not the lesson from this particular incident. This may be your pet peeve but it wasn't like the integration with libsystemd would have raised anyone's alarm.

8. This is just a thinly veiled way of saying "don't work with anyone of Chinese descent". I don't want to use the R word but you know exactly what I mean. There's no evidence Jia Tan is Chinese anyway, or that this is done by China. We simply don't know right now, and as far as we know they could have used any western sounding name. The core issue here is that the trust was misplaced, and the overworked maintainer didn't try to make sure the other person is a real one (e.g. basic Googling). So what, if you don't work with any Chinese, if someone is called "Ryan Gosling" you automatically trust them?

---

I do agree with point 7.



>4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.

They never did. In fact the systemd maintainers are confused on that point and adding documentation on how to implement the simple datagram without libsystemd.

>7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.

Way more than tens of millions. Python, php, ruby and many other languages depend on libxml2, libxml2 uses liblzma. And there's many other dependencies.

>8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.

Not any maintainer's job. OSS is provided without warranty. Also indication is "Jia Tan" may have been completely fake as their commit timestamps show even on the same day that their timezone switches from Eastern Europe to Asia. So at the very least, they were playing identity games.



I said this days ago, but re timezones - they are meaningless as even GCHQ and NSA etc will place false flags in code which has any kind of risk of exposure. I first learned about those techniques from all the high profile intelligence agency leaks from the USA who were performing those themselves.


It's not that simple. You can falsify a lot of things, but you can't easily falsify the working hours at which you reply to issues or push commits without taking a lot of care. Especially when DST has to be considered.

Sure, the +0800 timestamps are definitely fake. A handful of timestamps that were later scrubbed show +0200 and +0300, though. And all the commits match 9am to 6pm working hours if you interpret them as +0200/+0300. The working hours even shift around correctly with the DST change.

The issue is that russia doesn't observe DST anymore. That leaves Bulgaria, Cyprus, Estonia, Finland, Greece, Israel, Latvia, Lebanon, Lithuania, Moldova, Romania and Ukraine. Very few of those have the infosec capabilities needed for something like this.

Jia Tan was registered in 2021, but first sprung into action during the buildup to the russian invasion of Ukraine 2022.

Jia Tan also used a VPN provider that's headquartered in the US. That only makes sense if they're in a US-aligned country, as using a US VPN would give the US more insight into what you're doing, and only protect you from other countries.

Personally, I'd guess that it was Israeli intelligence. But Finland, where the original XZ author lives, is another interesting possibility.



Re the falsifying working hours, wouldn’t these boffins be able to automate Git commits at certain times or even pass instructions to another team who is working the late night shift to post these changes etc.

I went to MacDonald’s last night, it is open 24/7, these spy agencies surely aren’t more lazy than minimum wage MacD employees - I am sure they work around the clock. Plus, you have night hawks like me who get more stuck in to a project at 4am and sleep through the day.

Israeli intelligence, ah probably. Wouldn’t be surprised. I imagine if it was GCHQ, it wouldn’t have been so noisy and got uncovered like this.



> Re the falsifying working hours, wouldn’t these boffins be able to automate Git commits at certain times or even pass instructions to another team who is working the late night shift to post these changes etc.

Is it possible? Definitely. But that's extremely rare, especially if you want to keep a relatively natural pattern for the commits and replies.

You'd basically have to have a team of devs working at really odd times and a queuing system that automatically queues all emails, github interactions, commits, etc to dispatch them at correctly distributed timestamps.

And you'd need a source pattern to base your distribution on, which is hard to correctly model as well.

e.g., if someone slept badly one night, the next morning their interactions shift slightly back and are more sparse in the morning. Their lunch break will also shift due to that. Such changes usually are most prominent in the days surrounding DST changes.



> Is it possible? Definitely. But that's extremely rare, especially if you want to keep a relatively natural pattern for the commits and replies.

What sort of nonsense is this? Have you ever actually known any software developers? A huge number of them keep odd hours, moreso in the infosec sphere. They wouldn't need to automate anything, just start working hours that match the timezone that they're faking... If it really is a state actor, I imagine they'd be able to find someone willing to keep those hours.



I would have to agree with kushku, it’s a very very high bar to fake thousand of timestamps over several years in a consistent way that doesn’t attract suspicion under forensic scrutiny.

Of course this is post facto so not that helpful until after something serious happens.

If there was some sort of reputation system that could do this analysis automatically then that would be very useful.



You don't need to automate git commits to fake the timestamps; just change system clock to the desired time, make the commit and then reset the clock to local time when you're done. It all happens on the local machine so the timestamps of commits should be considered completely untrusted information.


That's exactly what Jia Tan did, but this failed a few times during rebases as well as with commits done from the web UI.

Additionally timestamps from comments on GitHub itself are trusted information and match the UTC+2/UTC+3 data well.



I think your eight point is regrettable but mostly true - I’d soften it to professional relationships, which kind of sucks for anyone trying to get started in the field who doesn’t get a job with someone established, and adds an interesting wrinkle to the RTO discussion since you might “work” with someone for years without necessarily knowing anything about them.

It also seems like we need some careful cultural management around trust: enshrine trust-but-verify pervasively to avoid focusing only on, say, Chinese H1-Bs or recent immigrants (whoops, spent all of your time on them and it turns out you missed the Mossad and Bulgarian hackers) and really doubling down on tamper-evidence, which also has the pleasant property of reducing the degree to which targeting OSS developers makes sense.

Combining your 7th point with that one, I’ve been wondering whether you could expand what happened with OpenSSL to have some kind of general OSS infrastructure program where everyone would pay to support a team which prioritizes supporting non-marquee projects and especially stuff like modernizing tool chains, auditing, sandboxing, etc. so basically any maintainer of something in the top n dependencies would have a trusted group to ask for help and be able to know that everyone on that team has gone through background checks, etc.



>4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.

This is ridiculous, nobody "encourages" every service to depend on it for initialization notifications, you can implement the logic in 10 lines of code or less.



To clarify (1), we should not be exchanging tarballs, period. Regardless of whether it's different from what's in the source repository.

It's 2024, not 1994. If something masquerading as open-source software is not committed to, and built from, a publicly verifiable version-controlled repository, it might as well not exist.



The other problem is that C’s engineering culture is termites all the way down.

A test resource getting linked into a final build is, itself, a problem - the tooling should absolutely make this difficult, and transparent/obvious when it happens.

But that’s difficult because C never shed the “pile of bash scripts” approach to build engineering… and fundamentally it’s an uphill battle to engineering a reliable system out of a pile of bash scripts.

The oft-discussed problems with undefined behavior, obscure memory/aliasing rules, etc are just the obvious smoke. C is termites all the way down and really shouldn’t be used anymore, it’s just also Too Big To Fail. Like if the world’s most critical infrastructure had been built in PHP.



libsystemd is too juicy of a target, especially with the code reuse that does not appear to take into account these attack vectors.

Perhaps any reuse of libraries in sensitive areas like libsystemd should require a separate copy and more rigorous review? This would allow things like libxv to be 'reused', but the 'safe' versions would require a separate codebase that gets audited updates from the mainline.



No they did not. The dependency is still there, it's just being lazy loaded.

This would be prevented this particular exploit, which would have needed to take another approach, but at the price of making dependencies invisible and hard to debug. You could no longer have found vulnerable systems by way of ldd.

The only solution for the attack surface of systemd is to make the individual components more loosely coupled. There is no reason the same library is responsible for readiness reporting and reading logs.

One could even argue that none of those functions have anything to do with the job of init. Readiness can break in a number of ways, robustness is built on health checks.



Even if it hadn't been loaded by libsystemd, liblzma is also loaded by SELinux, which would have allowed the same vulnerability via a different vector.

Personally I think projects like fedora silverblue/kinoite and other container-based OSes are going in the right direction. We need a base OS that's as small as possible so it can be audited, and everything else then needs to live in a container so it doesn't have to be audited but is still secured properly.



Well, not the same vulnerability, as libselinux isn't loaded by sshd. Any such request would probably have a low probability of acceptance among the openssh maintainers.

If anything, I think this shows that real world security is hard and must happen at every level. This library is likely to be included in any base OS no matter how small, and rebuilding the container world just to patch is inefficient.

This attack may have been found by luck alone, even if that luck involved having talented developers on our side, but it really showed how well the open source community responds to such attacks. Within a day of it being public, we had well mapped out what the problem was and how to best respond to it. A day that was also a holiday in large parts of the world.



You are accusing GP of something very serious, and while I see the direction you're coming from, I don't understand how accusations or insinuations like theone you're providing here are acceptable without significant additional justification or explanation.

Does that make sense? Do you think there's a chance that the person you're replying to is "not racist"?



Let’s step away from the extremist position and there’s an interesting problem underneath.

If a company or government organization is hiring engineers who will be responsible for critical code, they will undergo background checks at a minimum to ensure they are not an obvious threat to the organization. This isn’t about “country of origin”, but about the criminal history and ties of the individual regardless of origin.

In a government setting, security clearances will be required on top of basic background checks.

The FOSS community doesn’t have the tools to do this kind of screening, and arguably the openness of the community is what makes it successful.

But the uncomfortable reality is that there are indeed malicious actors, and the community doesn’t currently have a mechanism to proactively identify them and instead relies on discovering the results of their malicious behavior.

I don’t claim to know the solution, or if there even is one, but the existence of this comment thread is exhibit A for why we need to be thinking hard about how to proceed as a community.

FOSS projects are in the big leagues now, and will continue to come under threat from increasingly sophisticated actors.



I read the comment as -- we shouldn't hand control of valuable open source projects over to probable operatives of government security services. (Indeed, as many have mentioned, US intelligence might be behind this.)

Do you disagree with that? Or do you think OP meant something different? Because if they did it went right over my head.



I read it more as "random people on the internet may be part of some country trying backdoor software because countries are sngaged in cyber warfare, so actually meet the people you want to hand over access to"


I didn't realize it was a Microsoft engineer that works on Azure Postgres that found the issue.

Thanks, Microsoft, I like Azure now.



And I think people should now look at all oddities with Valgrind. Since that is how the issue got discovered. And then look at the problematic library and look for similar outliers of fake personas taking over a project.

It seems it is common practice for people to ignore these errors.



I remember this and the like is a good write-up, thanks for the link. I have see things like this over the past 30+ years. So a couple of Items to add:

* Comment the hell out of hidden logic like this. Explain why nor what :)

* Better yet, even though that uninitialized buffer helped with performance. These days it would be better to take the hit and add a random initialization of that buffer. Maybe read /dev/urandom or some other thing.

You do not know who will come along after you, so try and make things explicit.



There was an upstream OpenSSL bug there: they depended on reading from uninitialized memory to add entropy and thus increase startup speed of their RNG. But reading from uninitialized memory is undefined behavior, it's not guaranteed to add any entropy and should always be treated as a security bug. The Debian maintainers tried to fix the bug, but screwed up the fix. They should have reported the bug upstream, not just made their own fork.


My memory is fuzzy. In that case, I'd blame the OpenSSL devs if the report wasn't a patch with the (non-working) "fix". Even if it was, they should have accepted the bug & rejected the patch.


Valgrind will tell you about memory leaks and won't always behave the way it did here when there's a backdoor. In this case it just so happened that valgrind was throwing errors because the stack layout didn't match what the exploit was expecting. Otherwise valgrind would have probably worked without issues.


> the stack layout didn't match what the exploit was expecting.

What does that mean? Why is the exploit expecting something from the stack layout and why does valgrind complain?



I am also curious, and if something like asan would also have found it? It seems social engineering was used to get MS to stop fuzzing the library for malicious code, so if the malicious party expected the valgrind behavior they might have removed it as well.


Which to me is a very carefully orchestrated thing. You don't just spend 2 years of your life doing that. No loner would pre-plan this to such an extent and create sockpuppet accounts for all this.


That's because you're a normal, well adjusted person.

Consider TempleOS[1] which was created by a programmer having a series of manic episodes which he believed was God's instruction to create 640x480 pixels of perfection.

He spent the rest of his life on this.

People vastly underestimate the tenacity of individual fixated people: so much so that in the physical world victims usually feel isolated by their peers who just don't believe the degree of effort they stalker will actually go to.

[1] https://en.m.wikipedia.org/wiki/TempleOS



TempleOS feels a little different because Terry was fairly well-known in the community and didn't try to hide his identity. I'm pretty sure he went to conferences and has met with actual people who could verify his identity.

I haven't seen proof that Jia Tan is a real person and to me that's the most malicious part of the attack. I'm pretty confident that whoever is hiding behind the Jia Tan identity is a well adjusted individual (or group) and knows exactly what they're doing. It feels far too coordinated and careful to chalk up to a psychotic episode or manic behavior.



Exactly.

Also remember this

>> odd valgrind complaint in automated testing of postgres

I would imagine compiling a list of odd complaints may yield something , or nothing at all.



I’m guessing the original maintainer of xz handed responsibilities to Jia Tan without ever seeing him/her or at least sharing a phone call. Is that common to only communicate only through email/github? I guess some maintainers of open source projects will be more cautious after this story.


> Is that common to only communicate only through email/github?

Absolutely. I've both taken over libraries as a maintainer and given away the responsibility of maintaining a library after only communicating via text, and having no idea who the "real" person is.

> I guess some maintainers of open source projects will be more cautious after this story.

Which is completely the wrong takeaway. It's not the maintainer who is responsible for what people end up pulling into their project, it's up to the people who work on the project. Either you trust the maintainer, or you don't, and when you start to depend on a library, you're implicitly signing up for updating yourself on who you are trusting. For better or worse.



That’s basically how it is right now. Millions of companies freeloading off the work of unpaid open source developers. Unsurprisingly they sometimes leave and it causes problems.


> Is that common to only communicate only through email/github?

Yes. I’ve joined half a dozen open-source projects of various sizes (from 100 to 30k stars on GitHub) without ever calling anyone; written communication is the standard.



Have you ever interacted with a volunteer organization?

If you show up for a tea & cookies meet-and-greet and aren't careful, they'll nominate you for chair just because no one else wants it, and "showed up once to a scheduled event" is a higher bar than half the other members have met in while.



If you’re being berated by multiple people as to your speed of delivery, then it is not unexpected for them to be convinced that they are somehow the problem, and transfer the project to whoever they feel at the time is the best choice without thinking through their decisions.

However, knowing a person personally doesn’t necessarily solve the problem.

I used to work on an open source project a long time ago (under a pseudonym) that I do not wish to name here for reasons that’ll become clear shortly. The lead programmer had a co-maintainer who the lead seemed to have known quite well.

The co-maintainer constantly gaslit me, and later, other maintainers, belittled them, criticized them for the smallest of bugs etc. (and not in a Linus Torvalds way, where the rants are educational if you remove the insults) until they left; and was egged on by the lead maintainer as they agreed with the technical substance of these arguments.

Many years later, the co-maintainer attempted a hostile takeover of the project, which did not go as expected, and soon after, multiple private correspondences with other people became public where it became clear that the co-maintainer always wanted to do this, and gaslighting other maintainers was just part of this goal. All of this, despite the fact that the two of them knew each other.



They did communicate off list and non publicly, that's as much as we know at the moment.

As an open source developer he might have received donations too from the adversary - it's reasonably common for devs to get donations to "say thanks". He might have had voice chats with them, who knows. The emails might be with LEO at the moment but I think its in the public interest for all communications to be released.



If LEO is involved, they wouldn't be disclosing evidence to avoid the public interacting with suspects or possibly leapfrogging them and tipping off someone new.

In this case the public would benefit from knowing quickly who are the bad actors and what other projects they touched.



Can we not dogpile Lasse after his vacation was ruined by this. He has much bigger concerns right now than trying to export and sanitize his entire communication history with Jia.


I have a lot of respect for xz's original author, I just didn't think about the legal stuff, and that sounds quite reasonable to me now.

Personally, I find it hard to subscribe to certain theories, such as the possibility of Lasse being impersonated or involved in the incident. But that doesn't mean we should dismiss them outright at this stage. (And I'm sorry if you don't like to hear that, saying this is not comfortable for me either).



What does it change? Assuming that either:

- Jia Tan was initially a trustworthy actor that subsequently became malicious (maybe they were paid or compromised somehow)

- Jia Tan was always malicious, but played the long game by starting with legitimate contributions/intent for 1-2 years

How would meeting them for real have any impact?



If you look at their early commit history, "Jia Tan" was always a devious actor.

It's easy to think that they would just have made a video call, but it is a lot harder to lie convincingly over sync videochat than over async text. And a lot harder still to lie in person, and esp over multiple meetings.

Not to say it's impossible, people get scammed in person all the time! But it raises the bar, for sure.



I guess the blame is on the people who decide to depend on a very small (by team size at least) project: https://xkcd.com/2347/ . While having plenty of safer alternatives.

Lets suppose I create a personal and hobby project. Suddenly RedHat, Debian, Amazon, Google... you name it, decide to put my project as a fundamental dependency of their toolchain, without giving me at least some support in the form of trustable developers. The more cautious I would be is to shut down the project entirely or abandon it, but more probably I would have fallen to Jia Tan tricks.

Also, the phone call and even a face to face meeting wouldn't give you extra security. In what scenario a phone conversation with Jia would expose him, or would make you suspicious enough to not delegate?



Our goodwill is being used against us.

Suppose you have a chat with them and see that they're Chinese. What are your next actions? If you exclude them then that's racist right?

I don't have answers



Adding on to that, it might be difficult to differentiate between people from China vs Taiwan/Singapore/etc and since people are generally anonymous online, they can use any name they want


So while everyone thinks this backdoor was caught early, its purpose might have been achieved already. Especially if those targets were developers who used rolling release distros, like Kali and Debian.


>and argued that Lasse Collin, the longtime maintainer of xz Utils, hadn’t been updating the software often or fast enough.

This meme was a mistake.



"OpenSSH, the most popular sshd implementation, doesn't link the liblzma library, but Debian and many other Linux distributions add a patch to link sshd to systemd, a program that loads a variety of services during the system bootup. Systemd, in turn, links to liblzma, and this allows xz Utils to exert control over sshd."

Compare with:

"Xz is an open-source compression program, as well as a library that can be used to help you write your own program that deals with compressed data. It is used by a fairly large number of other programs, one of which is OpenSSH."

https://news.ycombinator.com/item?id=39881049

GNU's binutils links to liblzma. binutils is even more ubiquitous than OpenSSH; in most cases it's probably used in the compilation of OpenSSH, the operating systems on which sshd runs, and so on. The bad guys certainly picked a good project to potentially get deep into open source software.



It does not.

OpenSSH pulled in libsystemd to provide startup notification. Libsystemd pulled in liblzma. No code from liblzma normally ends up in OpenSSH. But because it is built as a dependency for libsystemd, it's build scripts are ran in the same environment as libsystemd, and OpenSSH.

The attack payload was hidden as an obfuscated binary blob in the liblzma tests directory, masqueraded as a compression test case. When lzma was compiled from the git sources, generating the build scripts using autotools, nothing untoward was done. But lzma was also provided as a source tarball that was used by distro packagers, that had the autotools already ran. The attacker replaced the autogenerated, unreadable script output with one that checked if liblzma was being compiled in the same environment as OpenSSH and if it was being compiled so that it was going to end up as a .deb or .rpm package, and if both were true, embed the attack payload into OpenSSH.

Then the attack payload started with a lot of checks, including testing whether OpenSSH was being started normally by init scripts or manually, and for the presence of usual debugging tools, and only attached the payload to the running process if it seemed like a "natural" bootup with no running debugging tools. When running, the payload hooked into private key verification, and if the correct private key attempted to login, the payload would take the rest of the incoming packet and call system with it, that is, provide remote code execution as root.



> pulled in libsystemd to provide startup notification

This seems sort of fine (although...why can't said notification be done by writing simple text to a pipe/file/socket?), but the library shouldn't be some kitchen-sink thing that links to the universe of attack surface.



Notice that the protocol specification is way more complex than that, and already changed since systemd was released.

All the library does is sending some data through the socket, but that's not at all what the docs tell you to do.



It comes from the idea that your computer shouldn't stop working at random.

But surely, what is needed is "interfaces can never be changed without redefining your project in a way that makes absolutely obvious it's incompatible with its past". Systemd fails that one too.



Even more, why can't information like this be passed up the call chain as integer return codes? That scheme has been a perfectly functional for decades, all the way back to at least BSD's rc scripts, maybe before.


> with one that checked if liblzma was being compiled in the same environment as OpenSSH

No, the attack didn't check for OpenSSH at build time, it checked whether it was injected into the ssh process at runtime.



Im curious as to why they picked the commit cadence they did. Why do this over the course of two years and not, say 8 months or 15 months? After committing the first patch, why did they wait x days/weeks/months to commit the second? Were they timing the commits off of release schedules, following some predetermined schedule, or something else?


> Malicious updates made to a ubiquitous tool were a few weeks away from going mainstream.

Imagine working, as an individual or as a group, for years and then getting caught mere weeks or months before most major distros were to incorporate your backdoor.

Someone or several people out there must be pissed off.



It amuses me how the default hacker stereotype changed from an autistic nerd person (no offense) to a state-backed USA/Chinese/Korean/Israel/Russian APT group.

I can absolutely imagine people like crimew pwning sshd because it's fun! it's interesting! it's a way to get people to think more about open source community! just why not?



Absolutely. The old "Dark Hoodie" stereotype is starting to get old.

However, I know that some of the Russian/East European teams were/are composed of a bunch of nerdy types that are rather loosely associated with state sponsors.

It's entirely possible that "Jia Tan" is a contractor that is hired to do the work, so even if we figured out who they were, we might never know who was pulling the puppet strings.



I suspect that something like this could have been pulled off by a single actor with the time and effort and skill needed, so literally every country in the world could have done it, theoretically.

My bet is on the Vatican City elite hackers.

(It's sophisticated, but it could have been done by a "commercial hacking group" for other purposes, especially to sell; if this had gotten into live RedHat systems it would be quite the valuable 0day.)



It sounds unlikely that he was an individual working on his own. So if the organisation he worked for has another Snowden, yes. (I am not saying it is necessarily the same organization, there are at least 4 obvious candidates and it could be a more surprising one.)


I have no idea if you're correct or not, but that doesn't really indicate who was behind it other than suggesting that it might NOT be Chinese state actors because that would be way to obvious of a giveaway when investing multiple years of effort into a stealthy project.


To be honest, I’m less worried about the person(s) behind the alias. It can be anybody at this point.

Maybe it’s China? Or maybe it’s Russia using Chinese VPNs and aliases? Oh wait maybe it’s Israel…

These are questions that the political asshats down in Washington DC ask every time shit like this happens. It’s almost always inconclusive.

I would rather focus attention on what can be done to improve software supply chain security.



Are we ever going to figure out who Satoshi is? Probably not anytime soon but we can look for clues. Jia was obviously interested in OSS security and fuzzing[0] but my wild guess is that s/he is not a state actor. I would rather assume s/he is a hobbyist opportunistic hacker who got trigged by the thought "If I can exploit this, why not?". I assume he intended to build a botnet and do whatever s/he came up with. The initial motivation could've been like I said opportunism and perhaps technical challenge of exploiting the software.

[0] https://github.com/JiaT75/oss-fuzz



I think this is somewhat unlikely. Timezone/timestamp analysis of their commits seems to show them working on it as a day job. And that they were obfuscating their location from the get-go (not 100% successfully). It may not have been a state actor or even paid, but it seems like they started with at least the intent to deceive about their identity and origin, and that they were working on it as more than just a hobby.


> Either way, it's your preference and I will follow your lead. Jia Tan

> It's out of the scope for this patch, but it is something worth considering. Just trying to do my part as a helper elf! Jia Tan

Sounds like someone pulling the strings and using the "it's your idea, I'm just following!" strategy.

My hunch from reading over all the language used is that this person spent a good deal of time in America and has a carefully crafted 'customer service' manner of speaking. I may be wrong on the spending time in America part, but they are most definitely used to putting people at ease with their word choice.

I also found this bit interesting, as it's one of the few times they referred to "us" and "we"

> https://www.mail-archive.com/[email protected]/msg00644.h...

> Please let us know if there are any concerns about the license change. We are looking forward to releasing 5.6.0 later this month!



From what I've seen (and I've not seen much, mind you) - someone went through and carefully categorized various libraries by the ability to be injected into OpenSSH on the target systems AND those that were lightly maintained, if at all.

xz was the winner, but there are likely others that could have been used.

联系我们 contact @ memedata.com