WebAssembly：如何分配你的分配器

WebAssembly：如何分配你的分配器
WebAssembly: How to Allocate Your Allocator

原始链接: https://nullprogram.com/blog/2025/04/19/

本文讨论了 WebAssembly (WASM) 程序的内存分配策略，重点关注 WASM 嵌入式环境和工具链不成熟带来的挑战。文章探讨了三种方法：导出静态堆、动态增长堆和导入动态堆。导出静态堆在嵌入式系统中很常见，需要在链接时预留内存，但 WASM 工具的限制使得必须在高级代码中定义它，这可能导致类型不兼容。此外，LLVM 在导入内存配置中处理内存初始化的方式会大幅增加 WASM 文件的大小。动态增长堆则使用 `memory.grow` 指令（通过 `__builtin_wasm_memory_grow` 访问）来请求额外的内存页，类似于 `sbrk`。虽然这种方法灵活，但它可能会使内存引用失效，从而使嵌入更加复杂。导入动态堆在受限环境中很常见，可以避免预分配问题。`memory.size` 指令和链接器提供的 `__heap_base` 常量用于确定堆的边界。文章重点介绍了每种方法的优缺点，并指出了 Clang 中缺乏 “noinit” 属性的问题。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 WebAssembly：如何分配你的分配器 (nullprogram.com) ingve 1小时前 10 分 | 隐藏 | 过去 | 收藏 | 1 评论 JonChesterfield 18分钟前 [–] > （如果Clang有一些“noinit”变量属性来允许堆不被初始化……）它是attribute((loader_uninitialized))，例如 https://github.com/llvm/llvm-project/blob/main/clang/test/Se... 回复加入我们，参加6月16-17日在旧金山举办的AI创业学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系我们搜索：

Tree-shaking，园艺误导算法（2023） 2024-04-15

（评论） 2023-11-04

关于竞技场分配器的错误和巧妙用法 2025-04-14

(评论) 2025-04-14

（评论） 2024-09-02

原文

April 19, 2025

nullprogram.com/blog/2025/04/19/

An early, small hurdle diving into WebAssembly was allocating my allocator. On a server or desktop with virtual memory, the allocator asks the operating system to map fresh pages into its address space (sbrk, anonymous mmap, VirtualAlloc), which it then dynamically allocates to different purposes. In an embedded context, dynamic allocation memory is typically a fixed, static region chosen at link time. The WASM execution environment more resembles an embedded system, but both kinds of obtaining raw memory are viable and useful in different situations.

For the purposes of this discussion, the actual allocator isn’t important. It could be a simple arena allocator, or a more general purpose buddy allocator. It could even be garbage collected with Boehm GC. Though WebAssembly’s linear memory is a poor fit for such a conservative garbage collector. In a compact address space starting at zero, and which doesn’t include code, memory addresses will be small numbers, and less distinguishable from common integer values. There’s also the issue that the garbage collector cannot scan the WASM stack, which is hidden from WASM programs by design. Only the ABI stack is visible. So a garage collector requires cooperation from the compiler — essentially as a distinct calling convention — to spill all heap pointers on the ABI stack before function calls. WASM C and C++ toolchains do not yet support this in a practical capacity.

Exporting a static heap

Let’s start with the embedded case because it’s simpler, and reserve a dynamic memory region at link time. WebAssembly is just reached 8 years old, so it’s early, and as we keep discovering, WASM tooling is still immature. wasm-ld doesn’t understand linker scripts, and there’s no stable, low-level assembly language on which to build, e.g. to reserve space, define symbols, etc. WAT is too high level and inflexible for this purpose, as we’ll soon see. So our only option is to brute force it in a high-level language:

char heap[16<<20];  // 16MiB

Plugging it into an arena allocator:

typedef struct {
    char *beg;
    char *end;
} Arena;

Arena getarena(void)
{
    Arena a = {0};
    a.beg = heap;
    a.end = heap + sizeof(heap);
    return a;
}

Unfortunately heap isn’t generic memory, but a high-level variable with a specific, fixed type. That’s why it would have been nice to reserve the memory outside the high-level language. In practice this works fine so long as everything is aligned, but strictly speaking, allocating any variable except char from this arena involves incompatible loads and stores on a char array. Clang doesn’t document any inline assembly interface for WASM, but neither does Clang forbid it. That leaves just enough room to launder the pointer if you’re worried about this technicality:

Arena getarena(void)
{
    Arena a = {0};
    a.beg = heap;
    asm ("" : "+r"(a.beg));  // launder
    a.end = a.beg + sizeof(heap);
    return a;
}

The +r means a.beg is both input and output. The address of the heap goes into the black box, and as far as the compiler is concerned, some mystery address comes out which, critically, has no effective type. The assembly block is empty (""), so it’s just a no-op, and we know (wink wink) it’s really the same address. Because the heap was “used” by the black box, Clang won’t optimize the heap out of existence beneath us. Also note that a.end was derived from the laundered pointer.

This static variable technique works well only in an exported memory configuration, which is what wasm-ld uses by default. When a module exports its memory, it indicates how much linear memory it requires on start, and the WASM runtime allocates and zero-initializes it at module initialization time. C and C++ toolchains depend on that runtime zeroing to initialize static and global variables, which are defined to be so initialized. Compilers generate code assuming these variables are zero initialized. This same paradigm is used for .bss sections in hosted environments.

In an imported memory configuration, linear memory is uninitialized. The memory may be re-used from, say, a destroyed module without zeroing, and may contain arbitrary data. In that case, C and C++ toolchains must zero the memory explicitly. It could potentially be done with a memory.fill instruction in the start section, but LLVM does not support start sections. Instead it uses an active data segment — a chunk of data copied into linear memory by the WASM runtime during initialization, before running the start function.

That is, when importing memory, LLVM actually stores all those those zeros in the WASM module so that the runtime can copy it into linear memory. WASM has no built-in compression, so your WASM module will be at least as large as your heap! Exporting or importing memory is determined at link-time, so at compile-time the compiler must assume the worst case. If you compile the example above, you get a 16MiB “object” file (in WASM format):

$ clang --target=wasm32 -c -O example.c
$ du -h example.o
16.0M   example.o

The WAT version of this file is 48MiB — clearly unsuitable as a low-level assembler. If linking with exported memory, wasm-ld discards all-zero active data segments. If using an imported memory configuration, it’s copied into the final image, producing a huge WASM image, though highly compressible. As a rule, avoid importing memory when using an LLVM toolchain. Regardless, large heaps created this way will have a significant compile-time cost.

$ time echo 'char heap[256<<20];' | clang --target=wasm32 -c -xc -
real    0m0.334s
user    0m0.013s
sys     0m0.262s

(If only Clang had some sort of “noinit” variable attribute in order to allow heap to be uninitialized…)

Growing a dynamic heap

WASM programs can grow linear memory using an sbrk-like memory.grow instruction. It operates in quantities of pages (64kB), and returns the old memory size. Because memory starts at zero, the old memory size is also the base address of the new allocation. Clang provides access to this instruction via an undocumented built-in:

size_t __builtin_wasm_memory_grow(int, size_t);

The first parameter selects a memory because someday there might be more than one. From this built-in we can define sbrk:

void *sbrk(ptrdiff_t size)
{
    size_t npages = (size + 0xffffu) >> 16;  // round up
    size_t old    = __builtin_wasm_memory_grow(0, npages);
    if (old == -1ul) {
        return 0;
    }
    return (void *)(old << 16);
}

To which Clang compiles (note the memory.grow):

(func $sbrk (param i32) (result i32)
  (select
    (i32.const 0)
    (i32.shl
      (local.tee 0
        (memory.grow
          (i32.shr_u
            (i32.add
              (local.get 0)
              (i32.const 65535))
            (i32.const 16))))
      (i32.const 16))
    (i32.eq
      (local.get 0)
      (i32.const -1))))

Applying that to create an arena like before:

Arena newarena(ptrdiff_t cap)
{
    Arena a = {0};
    a.beg = sbrk(cap);
    if (a.beg) {
        a.end = a.beg + cap;
    }
    return a;
}

Now we can choose the size of the arena, and we can use this to create multiple arenas (e.g. permanent, scratch, etc.). We could even continue growing the last-created arena in-place when it’s full.

If there was no memory.grow instruction, it could be implemented as a request through an imported function. The embedder using the WASM runtime can grow the memory on the module’s behalf in the same manner. But as that documentation indicates, either way growing the memory comes with a downside in the most common WASM runtimes, browsers: It “detaches” the memory from references, which complicates its use for the embedder. If a WASM module may grow its memory at any time, the embedder must reacquire the memory handle after every call. It’s not difficult, but it’s easy to forget, and mistakes are likely to go unnoticed until later.

Importing a dynamic heap

There’s a middle ground where a WASM module imports a dynamic-sized heap. That is, linear memory beyond the module’s base initialization. This might be the case, for instance, in a programming competition, where contestants submit WASM modules which must complete a task using the supplied memory. In that case we don’t reserve a static heap, so we’re not facing the storing-zeros issue. However, how do we “find” the memory? Linear memory layout will look something like so:

0 <-- stack | data | heap --> ?
|-----------------------------|

This diagram reflects the more sensible wasm-ld --stack-first layout, where the ABI stack overflows off the bottom end of memory. The heap is just excess memory beyond the data. To find the upper bound, WASM has a memory.size instruction to query linear memory size, which again Clang provides as an undocumented built-in:

size_t __builtin_wasm_memory_size(int);

Like before, this returns the result in number of 64k pages. That’s the high end. How do we find the low end? Similar to __stack_pointer, the linker creates a __heap_base constant, which is the address delineating data and heap in the diagram above. To use it, we need to declare it:

extern char __heap_base[];

Notice how it’s an array, not a pointer. It doesn’t hold an address, it is an address. In an ELF context this would called an absolute symbol. That’s everything we need to find the bounds of the heap:

Arena getarena(void)
{
    Arena a = {0};
    a.beg = __heap_base;
    a.end = (char *)(__builtin_wasm_memory_size(0) << 16);
    return a;
}

Then we continue forward using whatever memory the embedder deigned to provide. Hopefully it’s enough!