On 31 Jan 2026, I gave a talk at FOSDEM 2026 on phantom binary dependencies — packages that we depend on in binary form, even though these dependency relationships are invisible to us. If we cannot reliably identify these phantom dependencies, the sustainability and security of our tech infrastructure will be at risk, which threatens critical services such as hospitals, transportation and the internet.
You can watch my talk on this page, and I’ve included more details below, as well as a list of resources for those who want to learn more about this topic.
Abstract
When you create a software package, your work might depend on other packages. Usually, you will depend on the source code of these other packages. However, sometimes, you will depend on precompiled binaries of your dependencies. This frequently happens when calling compiled code, like C code, from other programming languages, such as Python.
In almost all ecosystems, it is difficult to keep track of binary dependencies. When you depend on a package’s source
code, this is normally recorded in your manifest file — pyproject.toml, package.json and so on. However, when you
depend on a package’s precompiled binaries, this information is usually not recorded anywhere. This means that the
binary dependency relationship between your project and whatever you’re depending on is hidden — so we can say that you
have a phantom binary dependency.
You can find detailed technical information about how binary dependencies work in my article titled How Binary Dependencies Work Across Different Languages.
Why are phantom binary dependencies important? For at least two reasons:
- Sustainability. Keystone maintainers struggle to get paid because of the Open Source sustainability crisis, which makes them more vulnerable to burnout. Projects like the Open Source Pledge, the Open Source Endowment, and thanks.dev help maintainers get paid. But we have to know which maintainers we depend on to be able to pay them. If we cannot identify our binary dependencies, we cannot identify which maintainers we should support, which puts the sustainability of the Open Source ecosystem at risk, threating our global tech infrastructure.
- Security. If your project depends on a library, any security issues in that library will leave your project vulnerable. If you don’t have a clear picture of which libraries you depend on, you won’t have a clear picture of security vulnerabilities that might affect you, which puts your project at risk. For software that supports critical infrastructure like hospitals, transportation systems, and the internet, this vulnerability puts the public at risk of harm.
There are tools we can build, and systemic changes we can make to our package management ecosystems, that will correct these issues, though this will be a lot of work. In my talk, I sketch the beginnings of a solution.
I believe we should start by creating tools that can identify and record binary dependencies for a wide variety of packages, giving maintainers, security researchers, and other parties the information they need to figure out more robust solutions.
Once we have this information, we can work on other aspects of the problem. How can we made sure that binary dependencies are always sourced from the appropriate package managers, instead of being merely vendored, so that they can be kept up to date with security patches? How can we make sure that language package managers and system package managers interoperate to warn developers of security issues across the entire dependency graph?
I’m actively involved in this work, and there are many conversations happening around these questions. If you’re interested, here are some resources to help you learn more and/or get involved!
Other Resources
- The bindep repository is where I keep my work-in-progress code and notes.
- The Open Source Pledge is an initiative aiming to get companies to pay the Open Source maintainers whose work they depend on. Pledge members have paid $6,879,498 to Open Source developers at the time of writing.
- The C-Shaped Hole in Package Management by Andrew Nesbitt, a great post about the problem of binary dependencies.
- Dependency Resolution in Python: Beware The Phantom Dependency by Anand Sawant is where the term phantom dependency was originally coined.
- auditwheel, a widely-used Python package that can identify a wheel’s required dynamic libraries, but does not yet have accurate human-readable output, or an API for researchers and developers to use.
- Issue #676 on auditwheel, where I ask the maintainers whether they are interested in adding APIs that would give users and researchers more visibility into binary dependencies.
- elfdeps, a lesser-known Python package that can identify a wheel’s required dynamic libraries and does have a public API.
- PEP 770, in which author Seth Larson introduces SBOMs to Python packages. See also his PR #577 to auditwheel, which enables users to discover binary dependencies, and include them in a package’s SBOM.
- PEP 725, which specifies a way to record binary dependencies in
pyproject.toml. This interoperable record of binary dependencies could allow many tools to, for example, flag security issues that were previously invisible. - PEP 804, which specifies a system for mapping binary dependency names to packages in non-Python registries.
- Issue #1261 on the ecosyste.ms packages repo collects more information on strategies for tracking binary dependencies across multiple package managers.
- ESSTRA is a tool developed at Sony that aims to improve supply chain transparency by embedding metadata into binaries.
- Fromager is a tool that aims, among other things, to provide a way for Python packages to be built completely from source, which includes building all their binary dependencies from source. I am not sure whether this is actually implemented yet.
- UAPI specification 8 provides a section within ELF binaries for recording the provenance of a dynamic library, including the name of the system package it originated from. See also Fedora’s page about Package information on ELF objects.
- UAPI specification 12 provides a section within ELF binaries for recording the names of libraries loaded
with
dlopen(). This would hypothetically allow us to keep track of libraries opened with libffi/cffi. However, I’m not sure how these records would be filled in, since it’s possible for the names of libraries opened withdlopen()to not be known until runtime. - In a paper titled Insight: Exploring Cross-Ecosystem Vulnerability Impacts, Xu et al describe a system
for identifying when Python code calls into parts of C-based dynamic libraries that are known to have security
vulnerabilities. According to their research, 24.0% of PyPI projects “transitively invoke vulnerable APIs from C
libraries”. Note, however, that Xu et al analyse Python code that calls into binary dependencies using
dlopen(), which is a notable caveat, since these kinds of FFI calls are not the most common way to call into binary dependencies. See my post on how binary dependencies work for more information. - The PURL registry is a registry of packages, mainly C/C++ packages, that do not otherwise have a PURL because they are not clearly identified in package registries. Such a registry could be used for assigning reliable identities to binary dependencies.
- How to publish binaries on npm by Luca Forstner tells us that it’s possible, but tricky, to publish binary packages on npm.