收缩的同时连接

收缩的同时连接
Shrinking While Linking

原始链接: https://www.tweag.io/blog/2025-11-27-shrinking-static-libs/

## 缩小静态库：一次尺寸缩减之旅分发静态库可能具有挑战性，因为它们的文件尺寸通常很大。虽然传统观念认为链接时会优化掉无用代码，但这对于*分发*库本身没有帮助。本文详细介绍了一项成功地大幅减小为与 Go 配合使用的 Rust 静态库尺寸的努力。最初的库大小为 132MB，超过了 GitHub 的文件大小限制。核心问题是什么？静态库会捆绑所有代码，包括依赖项甚至 LLVM bitcode 等优化数据，以及用于链接器清理的元数据。作者采用了几种技术。首先，使用 `--relocatable` 和 `--gc-sections` 重新链接目标文件删除了未使用的代码，将尺寸减小到 107MB。进一步的收益来自于剥离 LLVM bitcode 和调试信息，并将许多小函数段合并成更大的段，最终达到 19MB。一个利用 LLVM 工具的 MacOS 兼容工作流程实现了类似的结果，绕过了对 `--gc-sections` 支持的缺乏。最后，作者指出存在 Dragonfire 等工具，专为静态库去重而设计，为尺寸缩减提供了另一个潜在途径。关键要点是，通过仔细操作目标文件和链接器标志，可以实现显著的尺寸缩减，在尺寸与调试和未来优化之间取得平衡。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录缩减链接时 (tweag.io) 4 点赞 by ingve 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

If you’re anxious about the size of your binary, there’s a lot of useful advice on the internet to help you reduce it. In my experience, though, people are reticent to discuss their static libraries. If they’re mentioned at all, you’ll be told not to worry about their size: dead code will be optimized away when linking the final binary, and the final binary size is what matters.

But that advice didn’t help me, because I wanted to distribute a static library and the size was causing me problems. Specifically, I had a Rust library^{that I wanted to make available to Go developers. Both Rust and Go can interoperate
with C, so I compiled the Rust code into a C-compatible library and made a little
Go wrapper package for it.
Like most pre-compiled C libraries, I can distribute it either as a static
or a dynamic library. Now Go developers are accustomed to static linking, which
produces self-contained binaries that are refreshingly easy to deploy. Bundling
a pre-compiled static library with our Go package allows Go developers to just
go get https://github.com/nickel-lang/go-nickel and get to work. Dynamic
libraries, on the other hand, require runtime dependencies, linker paths, and installation instructions.}

So I really wanted to go the static route, even if it came with a slight size penalty. How large of a penalty are we talking about, anyway?

❯ ls -sh target/release/
132M libnickel_lang.a
15M  libnickel_lang.so

😳 Ok, that’s too much. Even if I were morally satisfied with 132MB of library, it’s way beyond GitHub’s 50MB file size limit.^{(Honestly, even the 15M shared
library seems large to me; we haven’t put much effort into optimizing code size yet.)}

The compilation process in a nutshell

Back in the day, your compiler or assembler would turn each source file into an “object” file containing the compiled code. In order to allow for source files to call functions defined in other source files, each object file could announce the list of functions^{that it defines, and the list of functions that
it very much hopes someone else will define. Then you’d run the linker, a program
that takes all those object files and mashes them together into a binary,
matching up the hoped-for functions with actual function
definitions or yelling “undefined symbol” if it can’t.
Modern compiled languages tweak this pipeline a little: Rust produces an
object file per crate^{instead of one per source file. But the basics
haven’t changed much.}}

A static library is nothing but a bundle of object files, wrapped in an ancient and never-quite-standardized archive format. No linker is involved in the creation of a static library: it will be used eventually to link the static library into a binary. The unfortunate consequence is that a static library contains a lot of information that we don’t want. For a start, it contains all the code of all our dependencies even if much of that code is unused. If you compiled your code with support for link-time optimization (LTO), it contains another copy (in the form of LLVM bitcode — more on that later) of all our code and the code of all our dependencies. And then because it has so much redundant code, it contains a bunch of metadata (section headers) to make it easier for the linker to remove that redundant code later. The underlying reason for all this is that extra fluff in object files isn’t usually considered a problem: it’s removed when linking the final binary (or shared library), and that’s all that most people care about.

Re-linking with `ld`

I wrote above that a linker takes a bunch of object files and mashes them together into a binary. Like everything in the previous section, this was an oversimplification: if you pass the --relocatable flag to your linker, it will mash your object files together but write out the result as an object file instead of a binary. If you also pass the --gc-sections flag, it will remove unused code while doing so.

This gives us a first strategy for shrinking a static archive:

unpack the archive, retrieving all the object files
link them all together into a single large object, removing unused code. In this step we need to tell the linker which code is used, and then it will remove anything that can’t be reached from the used code.
pack that single object back into a static library


ar x libnickel_lang.a



ld --relocatable --gc-sections -o merged.o *.o -u nickel_context_alloc -u nickel_context_free ...


ar rcs libsmaller_nickel_lang.a merged.o

This helps a bit: the archive size went from 132MB to 107MB. But there’s clearly still room for improvement.

Examining our merged object file with the size command, the largest section by far — weighing in at 84MB — is .llvmbc. Remember I wrote that we’d come back the LLVM bitcode? Well, when you compile something with LLVM (and the Rust compiler uses LLVM), it converts the original source code into an intermediate representation, then it converts the intermediate representation into machine code, and then^{it writes both
the intermediate representation and the machine code into an object file.
It keeps the intermediate representation around in case it has useful
information for further optimization during linking time. Even if that
information is useful, it isn’t 84MB useful.^{Out it goes:}}

objcopy --remove-section .llvmbc merged.o without_llvmbc.o

The next biggest sections contain debug information. Those might be useful, but we’ll remove them for now just to see how small we can get.

strip --strip-unneeded without_llvmbc.o -o stripped.o

At this point there aren’t any giant sections left. But there are more than 48,000 small sections. It turns out that the Rust compiler puts every single tiny function into its own little section within the object file. It does this to help the linker remove unused code: remember the --gc-sections argument to ld? It removes unused sections, and so if the sections are small then unused code can be removed precisely. But we’ve already removed unused code, and each of those 48,000 section headers is taking up space.

To do this, we write a linker script that tells ld to merge sections together. The meaning of the various sections isn’t important here: the point is that we’re merging sections with names like .text._ZN11nickel_lang4Expr7to_json17h and .text._ZN11nickel_lang4Expr7to_yaml17h into a single big .text section.


SECTIONS
{
  .text :
  {
    *(.text .text.*)
  }

  .rodata :
  {
    *(.rodata .rodata.*)
  }

  
}

And we use it like this:

ld --relocatable --script merge.ld stripped.o -o without_tiny_sections.o

Let’s take a look back at what we did to our archive, and how much it helped:

	Size
original	132MB
linked with `--gc-sections`	107MB
removed `.llvmbc`	33MB
stripped	25MB
merged sections	19MB

It’s probably possible to continue, but this is already a big improvement. We got rid of more than 85% of our original size!

We did lose something in the last two steps, though. Stripping the debug information might make backtraces less useful, and merging the sections removes the ability for future linking steps to remove unused code from the final binaries. In our case, our library has a relatively small and coarse API; I checked that as soon as you use any non-trivial function, less than 150KB of dead code remains. But you’ll need to decide for yourself whether these costs are worth the size reduction.

More portability with LLVM bitcode

I was reasonably pleased with the outcome of the previous section until I tried to port it to MacOS, because it turns out that the MacOS linker doesn’t support --gc-sections (it has a -dead_strip option, but it’s incompatible with --relocatable because apparently no one cares about code size unless they’re building a binary). After drafting this post but before publishing it, I found this nice post on shrinking MacOS static libraries using the toolchain from XCode. I’m no MacOS expert so I’m probably using it wrong, but I only got down to about 25MB (after stripping) using those tools. (If you know how to do better, let me know!)

But there is another way! Remember that we had two copies of all our code: the LLVM intermediate representation and the machine code.^{Last time, we chucked out
the intermediate representation and used the machine code. But since I don’t
know how to massage the machine code on MacOS, we can work with the intermediate
representation instead.}

The first step is to extract the LLVM bitcode and throw out the rest. (The section name on MacOS is __LLVM,__bitcode instead of .llvmbc like it was on Linux.)

for obj_file in ./*.o; do
  llvm-objcopy --dump-section=__LLVM,__bitcode="$obj_file.bc" "$obj_file"
done

Then we combine all the little bitcode files into one gigantic one:

llvm-link -o merged.bc ./*.bc

And we remove the unused code by telling LLVM which functions make up the public API. We ask it to “internalize” every function that isn’t in the list, and to remove code that isn’t reachable from a public function (the “dce” in “globaldce” stands for “dead-code elimination”).

opt \
  --internalize-public-api-list=nickel_context_alloc,... \
  --passes='internalize,globaldce' \
  -o small.bc \
  merged.bc

Finally, we recompile the result back into an object file and pop it into a static library. llc turns the LLVM bitcode back into machine code, so the resulting object file can be consumed by non-LLVM toolchains.

llc --filetype=obj --relocation-model=pic small.bc -o small.o
ar rcs libsmaller_nickel_lang.a small.o

The result is a 19MB static library, pretty much the same as the other workflow. Note that we don’t need the section-merging step here, because we didn’t ask llc to generate a section per function.

Dragonfire

Soon after drafting this post, I learned about dragonfire, a recently-released and awesomely-named tool for shrinking collections of static libs by pulling out and deduplicating object files. I don’t think this post’s techniques can be combined with theirs for extra savings, because you can’t both deduplicate and merge object files (I guess in principle you could deduplicate some and merge others, if you have very specific needs.) But it’s a great read, and I was gratified to discover that someone else shared my giant-Rust-static-library concerns.

Conclusion

We saw two ways to significantly reduce the size of a static library, one using classic tools like ld and objcopy and another using LLVM-specific tools. They both produced similar-sized outputs, but as with everything in life there are some tradeoffs. The “classic” bintools approach works with both GNU bintools and LLVM bintools, and it’s significantly faster — a few seconds, compared to a minute or so — than the LLVM tools, which need to recompile everything from the intermediate representation to machine code. Moreover, the bintools approach should work with any static library, not just one compiled with a LLVM-based toolchain.

On the other hand, the LLVM approach works on MacOS (and Linux, Windows, and probably others). For this reason alone, this is the way we’ll be building our static libraries for Nickel.