二进制发行版重建
Binary Distribution Rebuilds

原始链接: https://blog.josefsson.org/2025/03/31/on-binary-distribution-rebuilds/

Reproduce.Debian.net 项目的目标是通过使用原始构建输入,实现 Debian 软件包 100% 可重复构建。然而,仅仅重现旧的二进制文件不足以解决根本的信任问题,因为它依赖于最初的二进制文件,这些文件可能会随着时间的推移而无法构建。 作者提出了“幂等重建”的方法,这是一个迭代重建整个 Debian 存档(阶段 #0)的过程,使用结果构建一个新的镜像,然后再次重建存档(阶段 #1),以此类推。目标是达到一个点(阶段 #N),其中存档与前一阶段(N-1)相同。这将证明一种更强大的可重复性形式。 挑战包括构建时间戳和非确定性输出。下一步是发布阶段 #0 的构建结果(已经实现了 30% 的可重复性),然后使用这些构件构建阶段 #1,期望提高可重复性。最终目标是从不同的、可引导的环境(如 Guix)引导 Debian,提供更强大的安全性和信任保证。作者还对非自由固件可能阻碍自我重建表示担忧。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 二进制分发重建 (josefsson.org) JNRowe 1小时前 3 分 | 隐藏 | 往期 | 收藏 | 讨论 加入我们,参加6月16日至17日在旧金山举办的AI创业学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系我们 搜索:
相关文章

原文

I rebuilt (the top-50 popcon) Debian and Ubuntu packages, on amd and arm64, and compared the results a couple of months ago. Since then the Reproduce.Debian.net effort has been launched. Unlike my small experiment, that effort is a full-scale rebuild with more architectures. Their goal is to reproduce what is published in the Debian archive.

One differences between these two approaches are the build inputs: The Reproduce Debian effort use the same build inputs which were used to build the published packages. I’m using the latest version of published packages for the rebuild.

What does that difference imply? I believe reproduce.debian.net will be able to reproduce more of the packages in the archive. If you build a C program using one version of GCC you will get some binary output; and if you use a later GCC version you are likely to end up with a different binary output. This is a good thing: we want GCC to evolve and produce better output over time. However it means in order to reproduce the binaries we publish and use, we need to rebuild them using whatever build dependencies were used to prepare those binaries. The conclusion is that we need to use the old GCC to rebuild the program, and this appears to be the Reproduce.Debian.Net approach.

It would be a huge success if the Reproduce.Debian.net effort were to reach 100% reproducibility, and this seems to be within reach.

However I argue that we need go further than that. Being able to rebuild the packages reproducible using older binary packages only begs the question: can we rebuild those older packages? I fear attempting to do so ultimately leads to a need to rebuild 20+ year old packages, with a non-negligible amount of them being illegal to distribute or are unable to build anymore due to bit-rot. We won’t solve the Trusting Trust concern if our rebuild effort assumes some initial binary blob that we can no longer build from source code.

I’ve made an illustration of the effort I’m thinking of, to reach something that is stronger than reproducible rebuilds. I am calling this concept a Idempotent Rebuild, which is an old concept that I believe is the same as John Gilmore has described many years ago.

The illustration shows how the Debian main archive is used as input to rebuild another “stage #0” archive. This stage #0 archive can be compared with diffoscope to the main archive, and all differences are things that would be nice to resolve. The packages in the stage #0 archive is used to prepare a new container image with build tools, and the stage #0 archive is used as input to rebuild another version of itself, called the “stage #1” archive. The differences between stage #0 and stage #1 are also useful to analyse and resolve. This process can be repeated many times. I believe it would be a useful property if this process terminated at some point, where the stage #N archive was identical to the stage #N-1 archive. If this would happen, I label the output archive as an Idempotent Rebuild of the distribution.

How big is N today? The simplest assumption is that it is infinity. Any build timestamp embedded into binary packages will change on every iteration. This will cause the process to never terminate. Fixing embedded timestamps is something that the Reproduce.Debian.Net effort will also run into, and will have to resolve.

What other causes for differences could there be? It is easy to see that generally if some output is not deterministic, such as the sort order of assembler object code in binaries, then the output will be different. Trivial instances of this problem will be caught by the reproduce.debian.net effort as well.

Could there be higher order chains that lead to infinite N? It is easy to imagine the existence of these, but I don’t know how they would look like in practice.

An ideal would be if we could get down to N=1. Is that technically possible? Compare building GCC, it performs an initial stage 0 build using the system compiler to produce a stage 1 intermediate, which is used to build itself again to stage 2. Stage 1 and 2 is compared, and on success (identical binaries), the compilation succeeds. Here N=2. But this is performed using some unknown system compiler that is normally different from the GCC version being built. When rebuilding a binary distribution, you start with the same source versions. So it seems N=1 could be possible.

I’m unhappy to not be able to report any further technical progress now. The next step in this effort is to publish the stage #0 build artifacts in a repository, so they can be used to build stage #1. I already showed that stage #0 was around ~30% reproducible compared to the official binaries, but I didn’t save the artifacts in a reusable repository. Since the official binaries were not built using the latest versions, it is to be expected that the reproducibility number is low. But what happens at stage #1? The percentage should go up: we are now compare the rebuilds with an earlier rebuild, using the same build inputs. I’m eager to see this materialize, and hope to eventually make progress on this. However to build stage #1 I believe I need to rebuild a much larger number packages in stage #0, it could be roughly similar to the “build-essentials-depends” package set.

I believe the ultimate end goal of Idempotent Rebuilds is to be able to re-bootstrap a binary distribution like Debian from some other bootstrappable environment like Guix. In parallel to working on a achieving the 100% Idempotent Rebuild of Debian, we can setup a Guix environment that build Debian packages using Guix binaries. These builds ought to eventually converge to the same Debian binary packages, or there is something deeply problematic happening. This approach to re-bootstrap a binary distribution like Debian seems simpler than rebuilding all binaries going back to the beginning of time for that distribution.

What do you think?

PS. I fear that Debian main may have already went into a state where it is not able to rebuild itself at all anymore: the presence and assumption of non-free firmware and non-Debian signed binaries may have already corrupted the ability for Debian main to rebuild itself. To be able to complete the idempotent and bootstrapped rebuild of Debian, this needs to be worked out.

联系我们 contact @ memedata.com