二进制依赖：识别我们都依赖的隐藏包

二进制依赖：识别我们都依赖的隐藏包
Binary Dependencies: Identifying the Hidden Packages We All Depend On

原始链接: https://vlad.website/binary-dependencies-identifying-the-hidden-packages-we-all-depend-on/

## 幽灵二进制依赖：日益增长的威胁在2026年的FOSDEM上的一次演讲强调了“幽灵二进制依赖”这一关键问题——依赖于其他包的预编译代码（二进制文件），但这些依赖关系并未在项目清单中明确记录。这种隐藏的依赖关系对软件的可持续性和安全性构成风险。目前，包管理器会跟踪源代码依赖关系，但通常会忽略这些二进制链接，这在像Python这样的语言调用编译代码（如C）时经常发生。如果没有识别这些依赖关系，就无法通过像开源承诺这样的倡议来支持原始开发者，从而威胁到开源生态系统的长期健康。更重要的是，未记录的二进制依赖关系会产生安全漏洞。如果所依赖的二进制文件存在缺陷，项目会不知不觉地面临风险，可能影响到医院和互联网等关键基础设施。提出的解决方案包括开发工具来识别和记录这些二进制依赖关系，为改进的安全警告和维护者的可持续资助模式铺平道路。多个项目和提案（如Python中的PEP 770和804）已经在进行中，以应对这一挑战，旨在提高软件供应链的透明度。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录二进制依赖：识别我们都依赖的隐藏包 (vlad.website) 3点由 PaulHoule 2小时前 | 隐藏 | 过去 | 收藏 | 讨论帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

Or watch on YouTube

On 31 Jan 2026, I gave a talk at FOSDEM 2026 on phantom binary dependencies — packages that we depend on in binary form, even though these dependency relationships are invisible to us. If we cannot reliably identify these phantom dependencies, the sustainability and security of our tech infrastructure will be at risk, which threatens critical services such as hospitals, transportation and the internet.

You can watch my talk on this page, and I’ve included more details below, as well as a list of resources for those who want to learn more about this topic.

Abstract

When you create a software package, your work might depend on other packages. Usually, you will depend on the source code of these other packages. However, sometimes, you will depend on precompiled binaries of your dependencies. This frequently happens when calling compiled code, like C code, from other programming languages, such as Python.

In almost all ecosystems, it is difficult to keep track of binary dependencies. When you depend on a package’s source code, this is normally recorded in your manifest file — pyproject.toml, package.json and so on. However, when you depend on a package’s precompiled binaries, this information is usually not recorded anywhere. This means that the binary dependency relationship between your project and whatever you’re depending on is hidden — so we can say that you have a phantom binary dependency.

You can find detailed technical information about how binary dependencies work in my article titled How Binary Dependencies Work Across Different Languages.

Why are phantom binary dependencies important? For at least two reasons:

Sustainability. Keystone maintainers struggle to get paid because of the Open Source sustainability crisis, which makes them more vulnerable to burnout. Projects like the Open Source Pledge, the Open Source Endowment, and thanks.dev help maintainers get paid. But we have to know which maintainers we depend on to be able to pay them. If we cannot identify our binary dependencies, we cannot identify which maintainers we should support, which puts the sustainability of the Open Source ecosystem at risk, threating our global tech infrastructure.
Security. If your project depends on a library, any security issues in that library will leave your project vulnerable. If you don’t have a clear picture of which libraries you depend on, you won’t have a clear picture of security vulnerabilities that might affect you, which puts your project at risk. For software that supports critical infrastructure like hospitals, transportation systems, and the internet, this vulnerability puts the public at risk of harm.

There are tools we can build, and systemic changes we can make to our package management ecosystems, that will correct these issues, though this will be a lot of work. In my talk, I sketch the beginnings of a solution.

I believe we should start by creating tools that can identify and record binary dependencies for a wide variety of packages, giving maintainers, security researchers, and other parties the information they need to figure out more robust solutions.

Once we have this information, we can work on other aspects of the problem. How can we made sure that binary dependencies are always sourced from the appropriate package managers, instead of being merely vendored, so that they can be kept up to date with security patches? How can we make sure that language package managers and system package managers interoperate to warn developers of security issues across the entire dependency graph?

I’m actively involved in this work, and there are many conversations happening around these questions. If you’re interested, here are some resources to help you learn more and/or get involved!

Other Resources

The bindep repository is where I keep my work-in-progress code and notes.
The Open Source Pledge is an initiative aiming to get companies to pay the Open Source maintainers whose work they depend on. Pledge members have paid $6,879,498 to Open Source developers at the time of writing.
The C-Shaped Hole in Package Management by Andrew Nesbitt, a great post about the problem of binary dependencies.
Dependency Resolution in Python: Beware The Phantom Dependency by Anand Sawant is where the term phantom dependency was originally coined.
auditwheel, a widely-used Python package that can identify a wheel’s required dynamic libraries, but does not yet have accurate human-readable output, or an API for researchers and developers to use.
Issue #676 on auditwheel, where I ask the maintainers whether they are interested in adding APIs that would give users and researchers more visibility into binary dependencies.
elfdeps, a lesser-known Python package that can identify a wheel’s required dynamic libraries and does have a public API.
PEP 770, in which author Seth Larson introduces SBOMs to Python packages. See also his PR #577 to auditwheel, which enables users to discover binary dependencies, and include them in a package’s SBOM.
PEP 725, which specifies a way to record binary dependencies in pyproject.toml. This interoperable record of binary dependencies could allow many tools to, for example, flag security issues that were previously invisible.
PEP 804, which specifies a system for mapping binary dependency names to packages in non-Python registries.
Issue #1261 on the ecosyste.ms packages repo collects more information on strategies for tracking binary dependencies across multiple package managers.
ESSTRA is a tool developed at Sony that aims to improve supply chain transparency by embedding metadata into binaries.
Fromager is a tool that aims, among other things, to provide a way for Python packages to be built completely from source, which includes building all their binary dependencies from source. I am not sure whether this is actually implemented yet.
UAPI specification 8 provides a section within ELF binaries for recording the provenance of a dynamic library, including the name of the system package it originated from. See also Fedora’s page about Package information on ELF objects.
UAPI specification 12 provides a section within ELF binaries for recording the names of libraries loaded with dlopen(). This would hypothetically allow us to keep track of libraries opened with libffi/cffi. However, I’m not sure how these records would be filled in, since it’s possible for the names of libraries opened with dlopen() to not be known until runtime.
In a paper titled Insight: Exploring Cross-Ecosystem Vulnerability Impacts, Xu et al describe a system for identifying when Python code calls into parts of C-based dynamic libraries that are known to have security vulnerabilities. According to their research, 24.0% of PyPI projects “transitively invoke vulnerable APIs from C libraries”. Note, however, that Xu et al analyse Python code that calls into binary dependencies using dlopen(), which is a notable caveat, since these kinds of FFI calls are not the most common way to call into binary dependencies. See my post on how binary dependencies work for more information.
The PURL registry is a registry of packages, mainly C/C++ packages, that do not otherwise have a PURL because they are not clearly identified in package registries. Such a registry could be used for assigning reliable identities to binary dependencies.
How to publish binaries on npm by Luca Forstner tells us that it’s possible, but tricky, to publish binary packages on npm.