解决加法问题 – Matt Godbolt
Addressing the adding situation

原始链接: https://xania.org/202512/02-adding-integers

## x86 整数加法:编译器技巧 x86 汇编处理整数加法的方式与 ARM 等架构不同。虽然 ARM 允许直接的“结果 = 操作数1 + 操作数2”指令,但 x86 的 `add` 指令会修改第一个操作数 (`lhs += rhs`),缺乏专用的目标寄存器。 编译器通过巧妙地利用 x86 强大的内存寻址系统来克服这个限制。`mov` 指令可以直接访问内存,而无需专门的加载/存储,寻址模式允许在单个指令中进行复杂的计算。 `lea` (加载有效地址) 指令*计算*内存地址,而无需实际访问它。这被用于加法:通过将加法构建为计算一个由操作数偏移的内存地址,寻址硬件执行加法。 这个 `lea` 技巧有效地实现了三操作数加法,指定了目标寄存器并保留了原始操作数的值。虽然它涉及 64 位计算,即使是 32 位加法也是如此,但多余的位会被丢弃。这种技术通常可以节省一条指令并提高性能,编译器会自动在有利的情况下使用它。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 解决添加问题 (xania.org) 23点 由 messe 30分钟前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Written by me, proof-read by an LLM.
Details at end.

Yesterday we saw how compilers zero registers efficiently. Today let’s look at something a tiny bit less trivial (though not by much): adding two integers. What do you think a simple x86 function to add two ints would look like? An add, right? Let’s take a look!

Probably not what you were thinking, right? x86 is unusual in mostly having a maximum of two operands per instruction. There’s no add instruction to add edi to esi, putting the result in eax. On an ARM machine this would be a simple add r0, r0, r1 or similar, as ARM has a separate destination operand. On x86, things like add are not result = lhs + rhs but lhs += rhs. This can be a limitation, as we don’t get to control which register the result goes into, and we in fact lose the old value of lhs.

So how do compilers work around this limitation? The answer lies in an unexpected place - the sophisticated memory addressing system of the x86. Nearly every operand can be a memory reference - there’s no specific “load” or “store”; a mov can just refer to memory directly. Those memory references are pretty rich: you can refer to memory addressed by a constant, relative to a register, or relative to a register plus an offset (optionally multiplied by 1, 2, 4 or 8). Something like add eax, word ptr [rdi + rsi * 4 + 0x1000] is still a single instruction!

Sometimes you don’t want to access the memory at one of these complex addresses, you just want to calculate what the address would be. Sort of like C’s “address-of” (&) operator. That’s what lea (Load Effective Address) does: it calculates the address without touching memory.

Why is this useful for addition? Well, if we’re not actually accessing memory, we can abuse the addressing hardware as a calculator! That complex addressing mode with its register-plus-register-times-scale is really just shifting and adding - so lea becomes a cheeky way to do three-operand addition.

The compiler writes our simple addition in terms of the address of memory at rdi offset by rsi. We get a full add of two registers and we get to specify the destination too. You’ll notice that the operands are referenced as rdi and rsi (the 64-bit version) even though we only wanted a 32-bit add: because we are using the memory addressing system it unconditionally calculates a 64-bit address. However, in this case it doesn’t matter; those top bits are discarded when the result is written to the 32-bit eax.

Using lea often saves an instruction, is useful if both of the operands are still needed later on in other calculations (as it leaves them unchanged), and can execute on x86’s multiple execution units in the same cycle. Compilers know this though, so you don’t have to worry!

See the video that accompanies this post.


This post is day 2 of Advent of Compiler Optimisations 2025, a 25-day series exploring how compilers transform our code.

This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.

Support Compiler Explorer on Patreon or GitHub, or by buying CE products in the Compiler Explorer Shop.

联系我们 contact @ memedata.com