人口普查会“嘭”的一声失效吗?
Pop Goes the Population Count?

原始链接: https://xania.org/202512/11-pop-goes-the-weasel-er-count

## 统计一位数:一个编译器优化故事 “种群计数”——确定一个数字中设置位(1)的数量——在数据压缩和密码学等领域出乎意料地有用。一个朴素的 C 实现涉及循环遍历每个位,但存在更有效的“分而治之”方法。一个巧妙的循环利用位清除技巧:从值中减去一,然后执行按位与,有效地在每次迭代中移除最低有效设置位。 然而,现代编译器可以做得*更好*。使用特定于架构的标志编译(例如,Intel 处理器的 `-march=westmere`)允许编译器将此循环识别为与单个 `popcnt` 指令等效。这个由 clang 的“循环删除通道”识别出的优化,极大地简化了代码。 虽然可以优化手写循环,但建议使用标准库函数(如 `std::popcount`)以提高清晰度并确保使用所需的指令。这个例子突出了编译器优化的强大力量,并且是“编译器优化之旅 2025”系列的一部分。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 人口数量“砰”的一声下降? (xania.org) 7 分,由 hasheddan 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Written by me, proof-read by an LLM.
Details at end.

Who among us hasn’t looked at a number and wondered, “How many one bits are in there?” No? Just me then?

Actually, this “population count” operation can be pretty useful in some cases like data compression algorithms, cryptography, chess, error correction, and sparse matrix representations. How might one write some simple C to return the number of one bits in an unsigned 64 bit value?

One way might be to loop 64 times, checking each bit and adding one if set. Or, equivalently, shifting that bit down and adding it to a running count: sometimes the population count operation is referred to as a “horizontal add” as you’re adding all the 64 bits of the value together, horizontally. There are “divide and conquer” approaches too, see the amazing Stanford Bit Twiddling Hacks page for a big list.

My favourite way is to loop while the value is non-zero, and use a cute trick to “clear the bottom set bit”. The loop count is then the number of set bits. How do you clear the bottom set bit? You and a value with itself decremented!

value       : 11010100
subtract 1  : 11010011
& value     : 11010000

If you try some examples on paper, you’ll see that subtracting one always moves the bottom set bit down by one place, setting all the bits from there down. Everything else is left the same. Then when you and, the bottom set bit is guaranteed to be and-ed with zero, but everything else remains. Great stuff!

All right, let’s see what the compiler makes of this:

The core loop is pretty much what we’d expect, using the lea trick to get value - 1, anding and counting:

.L3:
  lea rax, [rdi-1]          ; rax = value - 1
  add edx, 1                ; ++result
  and rdi, rax              ; value &= value - 1
  jne .L3                   ; ...while (value)

Great stuff, but we can do better. By default gcc and clang both target some kind of “generic” processor which influences which instructions they can use. We’re compiling for Intel here, and gcc’s default is somewhere around Intel’s “nocona” architecture, from 2004. Unless you are running vintage hardware you can probably change it to something better. Let’s pick the super up-to-date “westmere” (from 2010…) using -march=westmere and see what happens:

Wow! The entire routine has been replaced with a single instruction - popcnt rax, rdi. When I first saw this optimisation I was blown away: the compiler recognises a relatively complex loop as being functionally equivalent to a single instruction. Both gcc and clang can do this, and within Compiler Explorer you can use the optimisation pipeline viewer in clang to see that clang’s “loop deletion pass” is responsible for this trick:

Screenshot of CE showing the opt pipeline viewer with the loop being replaced with a call to @llvm.ctpop.i64
Compiler Explorer's Opt Pipeline View

Compilers canonicalise code too, so some similar population count code will also be turned into a single instruction, though sadly not all. In this case, it’s probably better to actually use a standard C++ routine to guarantee the right instruction as well as reveal your intention: std::popcount. But even if you don’t, the compiler might just blow your mind with a single instruction anyway.

See the video that accompanies this post.


This post is day 11 of Advent of Compiler Optimisations 2025, a 25-day series exploring how compilers transform our code.

This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.

Support Compiler Explorer on Patreon or GitHub, or by buying CE products in the Compiler Explorer Shop.

联系我们 contact @ memedata.com