当编译器让你感到意外时
When Compilers Surprise You

原始链接: https://xania.org/202512/24-cunning-clang

这篇帖子详细介绍了 GCC 和 Clang 编译器对一个简单函数的惊人优化,该函数旨在计算到给定值的整数之和。 GCC 在 -O2 优化级别下,巧妙地优化循环以一次添加两个数字,识别出添加 `x` 和 `x+1` 等同于 `x*2 + 1` 的模式。在 -O3 级别下,它进一步向量化循环以进行并行加法。 然而,Clang 更进一步——它*完全消除了循环*!相反,它利用了整数求和的封闭形式数学解:`n(n-1)/2`。这会将算法从线性时间复杂度 (O(n)) 转换为常数时间复杂度 (O(1))。 作者是一位拥有 20 多年经验的编译器专家,他对这些优化表示惊叹,强调了现代编译器中蕴含的惊人深度和巧妙之处。这是“编译器优化历险记”系列中的第 25 天内容。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 当编译器让你惊讶时 (xania.org) 19 分,来自 brewmarche 1 小时前 | 隐藏 | 过去的 | 收藏 | 3 评论 dejj 5 分钟前 | 下一个 [–] 很不错。我想知道是否有人尝试检测图着色问题,并将其替换为常量。回复 mgaunard 19 分钟前 | 上一个 [–] 这些只是基本且必要的优化,这里没有什么太令人惊讶的。整数之和实际上是我在面试中问开发人员的问题(从初级到高级都适用),额外的问题是如果使用浮点数代替整数会发生什么。回复 bayesnet 0 分钟前 | 父级 [–] 为了提供问题的第二部分答案,没有封闭形式的解。由于浮点数学不具有结合律,因此无法应用保留 O(n) 循环精确输出的 O(1) 优化。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Written by me, proof-read by an LLM.
Details at end.

Every now and then a compiler will surprise me with a really smart trick. When I first saw this optimisation I could hardly believe it. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value:

So far so decent: GCC has done some preliminary checks, then fallen into a loop that efficiently sums numbers using lea (we’ve seen this before). But taking a closer look at the loop we see something unusual:

.L3:
  lea edx, [rdx+1+rax*2]        ; result = result + 1 + x*2
  add eax, 2                    ; x += 2
  cmp edi, eax                  ; x != value
  jne .L3                       ; keep looping

The compiler has cleverly realised it can do two numbers at a time using the fact it can see we’re going to add x and x + 1, which is the same as adding x*2 + 1. Very cunning, I think you’ll agree!

If you turn the optimiser up to -O3 you’ll see the compiler works even harder to vectorise the loop using parallel adds. All very clever.

This is all for GCC. Let’s see what clang does with our code:

This is where I nearly fell off my chair: there is no loop! Clang checks for positive value, and if so it does:

  lea eax, [rdi - 1]        ; eax = value - 1
  lea ecx, [rdi - 2]        ; ecx = value - 2
  imul rcx, rax             ; rcx = (value - 1) * (value - 2)
  shr rcx                   ; rcx >>= 1
  lea eax, [rdi + rcx]      ; eax = value + rcx
  dec eax                   ; --eax
  ret

It was not at all obvious to me what on earth was going on here. By backing out the maths a little, this is equivalent to:

v + ((v - 1)(v - 2) / 2) - 1;

Expanding the parentheses:

v + (v² - 2v - v + 2) / 2 - 1

Rearranging a bit:

(v² - 3v + 2) / 2 + (v - 1)

Multiplying the (v - 1) by 2 / 2:

(v² - 3v + 2) / 2 + (2v - 2)/2

Combining those and cancelling:

Simplifying and factoring gives us v(v - 1) / 2 which is the closed-form solution to the “sum of integers”! Truly amazing - we’ve gone from an O(n) algorithm as written, to an O(1) one!

I love that despite working with compilers for more than twenty years, they can still surprise and delight me. The years of experience and work that have been poured into making compilers great is truly humbling, and inspiring.

We’re nearly at the end of this series - there’s so much more to say but that will have to wait for another time. Tomorrow will be a little different: see you then!

See the video that accompanies this post.


This post is day 24 of Advent of Compiler Optimisations 2025, a 25-day series exploring how compilers transform our code.

This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.

Support Compiler Explorer on Patreon or GitHub, or by buying CE products in the Compiler Explorer Shop.

联系我们 contact @ memedata.com