Written by me, proof-read by an LLM.
Details at end.
Yesterday we saw how compilers zero registers efficiently. Today let’s look at something a tiny bit less trivial (though not by much): adding two integers. What do you think a simple x86 function to add two ints would look like? An add, right? Let’s take a look!
Probably not what you were thinking, right? x86 is unusual in mostly having a maximum of two operands per instruction. There’s no add instruction to add edi to esi, putting the result in eax. On an ARM machine this would be a simple add r0, r0, r1 or similar, as ARM has a separate destination operand. On x86, things like add are not result = lhs + rhs but lhs += rhs. This can be a limitation, as we don’t get to control which register the result goes into, and we in fact lose the old value of lhs.
So how do compilers work around this limitation? The answer lies in an unexpected place - the sophisticated memory addressing system of the x86. Nearly every operand can be a memory reference - there’s no specific “load” or “store”; a mov can just refer to memory directly. Those memory references are pretty rich: you can refer to memory addressed by a constant, relative to a register, or relative to a register plus an offset (optionally multiplied by 1, 2, 4 or 8). Something like add eax, word ptr [rdi + rsi * 4 + 0x1000] is still a single instruction!
Sometimes you don’t want to access the memory at one of these complex addresses, you just want to calculate what the address would be. Sort of like C’s “address-of” (&) operator. That’s what lea (Load Effective Address) does: it calculates the address without touching memory.
Why is this useful for addition? Well, if we’re not actually accessing memory, we can abuse the addressing hardware as a calculator! That complex addressing mode with its register-plus-register-times-scale is really just shifting and adding - so lea becomes a cheeky way to do three-operand addition.
The compiler writes our simple addition in terms of the address of memory at rdi offset by rsi. We get a full add of two registers and we get to specify the destination too. You’ll notice that the operands are referenced as rdi and rsi (the 64-bit version) even though we only wanted a 32-bit add: because we are using the memory addressing system it unconditionally calculates a 64-bit address. However, in this case it doesn’t matter; those top bits are discarded when the result is written to the 32-bit eax.
Using lea often saves an instruction, is useful if both of the operands are still needed later on in other calculations (as it leaves them unchanged), and can execute on x86’s multiple execution units in the same cycle. Compilers know this though, so you don’t have to worry!
See the video that accompanies this post.
This post is day 2 of Advent of Compiler Optimisations 2025, a 25-day series exploring how compilers transform our code.
This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.
Support Compiler Explorer on Patreon or GitHub, or by buying CE products in the Compiler Explorer Shop.