![]() |
|
![]() |
|
This is very likely just a mistake and not deliberate. a) absolutely nobody uses cmake to build this packet b) if you try to build the packet with cmake and -DENABLE_SANDBOX=landlock, the build just fails: https://i.imgur.com/7xbeWFx.png The "." does not disable sandboxing, it just makes it impossible to build with cmake. If anyone had ever actually tried building it with cmake, they would get the error and realize that something is wrong. It makes absolutely no sense that this would be malicious attempt to reduce security. |
![]() |
|
A disturbing thought here is that unit tests opened up an attack vector. Without the tests this would have been much harder to hide.
|
![]() |
|
It wasn't the writing style, it was the "let's put AI in it" content that triggered me. No, it's not a valid idea, trusting LLMs with this would be plain catastrophic with all it's hallucinations.
|
![]() |
|
Not only were they a co-maintainer, but if you're relying on code review to ensure correctness and security, you've already lost the battle. Code reviews are more about education and de-siloing. |
![]() |
|
Unless I’m misunderstanding, all this code was embedded and hidden inside the obfuscated test files. None of this would have been visible in commits or diffs at all. |
![]() |
|
But what are you suggesting exactly? The code fragment you quoted was awk code. Awk is a generic programming language. Any programming language can be written to be complex and unreadable.
|
![]() |
|
> Any programming language can be written to be complex and unreadable.
The question is you as lead developer, reviewing a commit with a complex and unreadable code snippet, what would you do?
|
![]() |
|
You would reject it of course, which is exactly why this code never appeared in a commit. The stage 0 of the exploit was not checked in, but directly added to the autogenerated build script in the release tarball, where, even if someone did review the script, it looks plausibly like other autogenerated build gunk. The complex and unreadable scripts in the further stages were hidden inside binary test files, so no one reviewing the commit that added them (https://git.tukaani.org/?p=xz.git;a=commit;h=cf44e4b) would directly see that code.
|
![]() |
|
The commit messages for the test files claim they used an RNG to generate them. The guy making the release tarball then put the final line in the right place without checking it in.
|
![]() |
|
can we start considering binary files committed to a repo, even as data for tests, to be a huge red flag, and that the binary files themselves should instead, to the greatest extent possible, be generated at testing time by source code that's stated as reviewable cleartext (though I think this might be very difficult for some situations). This would make it much harder (though of course we can never really say "impossible") to embed a substantial payload in this way. when binary files are part of a test suite, they are typically trying to illustrate some element of the program being tested, in this case a file that was incorrectly xz-encoded. Binary files like these weren't typed by hand, they will always ultimately come from something plaintext source, modulo whatever "real world" data came in, like randomly generated numbers, audio or visual data, etc. Here's an example! My own SQLAlchemy repository has a few binary files in it! https://github.com/sqlalchemy/sqlalchemy/blob/main/test/bina... oh noes. Why are those files there? well in this case I just wanted to test that I can send large binary BLOBs into the database driver and I was lazy. This is actually pretty dumb, the two binary files here add 35K of useless crap to the source, and I could just as easily generate this binary data on the fly using a two liner that spits out random bytes. Anyone could see that two liner and know that it isn't embedding a malicious payload. If I wanted to generate a poorly formed .xz file, I'd illustrate source code that generates random data, runs it through .xz, then applies "corruption" to it, like zeroing out the high bit of every byte. The process by which this occurs would be all reviewable in source code. Where I might be totally off here is if you're an image processing library and you want to test filters on an image, and you have the "before" and "after" images, or something similar for audio information, or other scientifically-generated real world datapoints that have some known meaning. That might be difficult to generate programmatically, and I guess even if said data were valid, payloads could be applied steganographically. So I don't know! But just like nobody would ever accept a PR that has a "curl https://some_rando_url/myfile.zip" inside of it, we should not accept PRs that have non-cleartext binary data in them, or package them, without really vetting the contents of those binary files. The simple presence of a binary file in a PR can certainly be highlighted, github could put a huge red banner BINARY FILES IN THIS PR. Downstream packagers for distros like Debian, Redhat etc. would ideally be similarly skeptical of new binary files that appear in the source downloads, and tooling can be applied to highlight the appearance of such files. Packagers would be on the hook to confirm the source of these binary files, or ensure they are deleted (even disabling tests if necessary) before the build process is performed. |
![]() |
|
That's for 2 reasons: 1. It might not be there in the place where you're looking. It exists in the m4 in the release tarballs, not in the git repo. 2. It's highly obfuscated. |
![]() |
|
Never allow complexity in code or so-called engineers who ask to merge tons of shitty code. Get rid of that shit and don't trust committers blindly. Anyone who enables this crap is also a liability.
|
![]() |
|
You do realize that "that shit" was part of the obfuscated and xz-compressed backdoor hidden as binary test file, right? It was never committed in plain sight. You can go to https://git.tukaani.org/xz.git and look at the commits yourself – while the commits of the attacker are not prime examples of "good commits", they don't have glaringly obvious red flags either. This backdoor was very sophisticated and well-hidden, so your comment misses the point completely.
|
![]() |
|
The whole XZ drama reminds me of this[1], in another words, verify the identity of open source maintainer/s and question their motive for joining the open source project. Also reminded me of the relevant XKCD meme[2]. Speaking of obfuscation; I'm not a programmer but I did some research in Windows malware RE and what stuck with me is that every code that is obfuscated or every code that is unused is automatically suspicious. There is no purpose for obfuscated code in the open source non-profit software project and there is no purpose for extra code that is unused. Extra/redundant code is most likely junk code meant to confuse the reverse engineer when s/he is debugging the binary. [1] https://lwn.net/Articles/846272/ [2] https://xkcd.com/2347/ |
![]() |
|
> verify the identity of open source maintainer/s and question their motive for joining the open source project. This kind of goes against the whole "free" thing. |
This, to me, is the most important question. There is no way Andres Freund just happened to find the _only_ backdoored popular open source project out there. There must be like a dozen of these things in the wild?