闭包作为Win32窗口过程
Closures as Win32 Window Procedures

原始链接: https://nullprogram.com/blog/2025/12/12/

## 通过 JIT 编译为 Win32 窗口过程添加上下文 本文详细介绍了一种技术,用于向标准的 Win32 窗口过程 (WNDPROC) 添加第五个参数——一个上下文指针。传统上,WNDPROC 缺乏直接访问程序状态的方式。作者重新审视了之前使用 JIT 编译包装器的方法,并通过从加载器直接分配可执行内存来改进它。这确保了生成的代码(“跳转”)驻留在主程序代码附近,从而实现高效的相对寻址。 核心思想是创建一个小函数 (`make_wndproc`),该函数动态生成一个跳转——一段简短的可执行代码,它将原始 WNDPROC 调整为接受额外的上下文参数。这个跳转管理堆栈帧设置和参数传递。 作者提供了一个完整的、可运行的示例来演示这一点,并指出该解决方案即使在启用控制流保护的情况下也能正常工作。虽然比使用 `GWLP_USERDATA` 更复杂,但这种技术为像需要自定义分配器而没有上下文指针的库这样的场景提供了一个潜在的更清晰的解决方案,从而提供了一种灵活的方式将运行时数据绑定到回调函数。作者建议将此技术作为一种有用的技巧,为未来的项目“留着”。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 闭包作为Win32窗口过程 (nullprogram.com) 8 分,作者 ibobev 21 分钟前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

nullprogram.com/blog/2025/12/12/

Back in 2017 I wrote about a technique for creating closures in C using JIT-compiled wrapper. It’s neat, though rarely necessary in real programs, so I don’t think about it often. I applied it to qsort, which sadly accepts no context pointer. More practical would be working around insufficient custom allocator interfaces, to create allocation functions at run-time bound to a particular allocation region. I’ve learned a lot since I last wrote about this subject, and a recent article had me thinking about it again, and how I could do better than before. In this article I will enhance Win32 window procedure callbacks with a fifth argument, allowing us to more directly pass extra context. I’m using w64devkit on x64, but the everything here should work out-of-the-box with any x64 toolchain that speaks GNU assembly.

A window procedure has this prototype:

LRESULT Wndproc(
  HWND hWnd,
  UINT Msg,
  WPARAM wParam,
  LPARAM lParam,
);

To create a window we must first register a class with RegisterClass, which accepts a set of properties describing a window class, including a pointer to one of these functions.

    MyState *state = ...;

    RegisterClassA(&(WNDCLASSA){
        // ...
        .lpfnWndProc   = my_wndproc,
        .lpszClassName = "my_class",
        // ...
    });

    HWND hwnd = CreateWindowExA("my_class", ..., state);

The thread drives a message pump with events from the operating system, dispatching them to this procedure, which then manipulates the program state in response:

    for (MSG msg; GetMessageW(&msg, 0, 0, 0);) {
        TranslateMessage(&msg);
        DispatchMessageW(&msg);  // calls the window procedure
    }

All four WNDPROC parameters are determined by Win32. There is no context pointer argument. So how does this procedure access the program state? We generally have two options:

  1. Global variables. Yucky but easy. Frequently seen in tutorials.
  2. A GWLP_USERDATA pointer attached to the window.

The second option takes some setup. Win32 passes the last CreateWindowEx argument to the window procedure when the window created, via WM_CREATE. The procedure attaches the pointer to its window as GWLP_USERDATA. This pointer is passed indirectly, through a CREATESTRUCT. So ultimately it looks like this:

    case WM_CREATE:
        CREATESTRUCT *cs = (CREATESTRUCT *)lParam;
        void *arg = (struct state *)cs->lpCreateParams;
        SetWindowLongPtr(hwnd, GWLP_USERDATA, (LONG_PTR)arg);
        // ...

In future messages we can retrieve it with GetWindowLongPtr. Every time I go through this I wish there was a better way. What if there was a fifth window procedure parameter though which we could pass a context?

typedef LRESULT Wndproc5(HWND, UINT, WPARAM, LPARAM, void *);

We’ll build just this as a trampoline. The x64 calling convention passes the first four arguments in registers, and the rest are pushed on the stack, including this new parameter. Our trampoline cannot just stuff the extra parameter in the register, but will actually have to build a stack frame. Slightly more complicated, but barely so.

Allocating executable memory

In previous articles, and in the programs where I’ve applied techniques like this, I’ve allocated executable memory with VirtualAlloc (or mmap elsewhere). This introduces a small challenge for solving the problem generally: Allocations may be arbitrarily far from our code and data, out of reach of relative addressing. If they’re further than 2G apart, we need to encode absolute addresses, and in the simple case would just assume they’re always too far apart.

These days I’ve more experience with executable formats, and allocation, and I immediately see a better solution: Request a block of writable, executable memory from the loader, then allocate our trampolines from it. Other than being executable, this memory isn’t special, and allocation works the usual way, using functions unaware it’s executable. By allocating through the loader, this memory will be part of our loaded image, guaranteed to be close to our other code and data, allowing our JIT compiler to assume a small code model.

There are a number of ways to do this, and here’s one way to do it with GNU-styled toolchains targeting COFF:

        .section .exebuf,"bwx"
        .globl exebuf
exebuf:	.space 1<<21

This assembly program defines a new section named .exebuf containing 2M of writable ("w"), executable ("x") memory, allocated at run time just like .bss ("b"). We’ll treat this like an arena out of which we can allocate all trampolines we’ll probably ever need. With careful use of .pushsection this could be basic inline assembly, but I’ve left it as a separate source. On the C side I retrieve this like so:

typedef struct {
    char *beg;
    char *end;
} Arena;

Arena get_exebuf()
{
    extern char exebuf[1<<21];
    Arena r = {exebuf, exebuf+sizeof(exebuf)};
    return r;
}

Unfortunately I have to repeat myself on the size. There are different ways to deal with this, but this is simple enough for now. I would have loved to define the array in C with the GCC section attribute, but as is usually the case with this attribute, it’s not up to the task, lacking the ability to set section flags. Besides, by not relying on the attribute, any C compiler could compile this source, and we only need a GNU-style toolchain to create the tiny COFF object containing exebuf.

While we’re at it, a reminder of some other basic definitions we’ll need:

#define S(s)            (Str){s, sizeof(s)-1}
#define new(a, n, t)    (t *)alloc(a, n, sizeof(t), _Alignof(t))

typedef struct {
    char     *data;
    ptrdiff_t len;
} Str;

Str clone(Arena *a, Str s)
{
    Str r = s;
    r.data = new(a, r.len, char);
    memcpy(r.data, s.data, (size_t)r.len);
    return r;
}

Which have been discussed at length in previous articles.

Trampoline compiler

From here the plan is to create a function that accepts a Wndproc5 and a context pointer to bind, and returns a classic WNDPROC:

WNDPROC make_wndproc(Arena *, Wndproc5, void *arg);

Our window procedure now gets a fifth argument with the program state:

LRESULT my_wndproc(HWND, UINT, WPARAM, LPARAM, void *arg)
{
    MyState *state = arg;
    // ...
}

When registering the class we wrap it in a trampoline compatible with RegisterClass:

    RegisterClassA(&(WNDCLASSA){
        // ...
        .lpfnWndProc   = make_wndproc(a, my_wndproc, state),
        .lpszClassName = "my_class",
        // ...
    });

All windows using this class will readily have access to this state object through their fifth parameter. It turns out setting up exebuf was the more complicated part, and make_wndproc is quite simple!

WNDPROC make_wndproc(Arena *a, Wndproc5 proc, void *arg)
{
    Str thunk = S(
        "\x48\x83\xec\x28"      // sub   $40, %rsp
        "\x48\xb8........"      // movq  $arg, %rax
        "\x48\x89\x44\x24\x20"  // mov   %rax, 32(%rsp)
        "\xe8...."              // call  proc
        "\x48\x83\xc4\x28"      // add   $40, %rsp
        "\xc3"                  // ret
    );
    Str r   = clone(a, thunk);
    int rel = (int)((uintptr_t)proc - (uintptr_t)(r.data + 24));
    memcpy(r.data+ 6, &arg, sizeof(arg));
    memcpy(r.data+20, &rel, sizeof(rel));
    return (WNDPROC)r.data;
}

The assembly allocates a new stack frame, with callee shadow space, and with room for the new argument, which also happens to re-align the stack. It stores the new argument for the Wndproc5 just above the shadow space. Then calls into the Wndproc5 without touching other parameters. There are two “patches” to fill out, which I’ve initially filled with dots: the context pointer itself, and a 32-bit signed relative address for the call. It’s going to be very near the callee. The only thing I don’t like about this function is that I’ve manually worked out the patch offsets.

It’s probably not useful, but it’s easy to update the context pointer at any time if hold onto the trampoline pointer:

void set_wndproc_arg(WNDPROC p, void *arg)
{
    memcpy((char *)p+6, &arg, sizeof(arg));
}

So, for instance:

    MyState *state[2] = ...;  // multiple states
    WNDPROC proc = make_wndproc(a, my_wndproc, state[0]);
    // ...
    set_wndproc_arg(proc, state[1]);  // switch states

Though I expect the most common case is just creating multiple procedures:

    WNDPROC procs[] = {
        make_wndproc(a, my_wndproc, state[0]),
        make_wndproc(a, my_wndproc, state[1]),
    };

To my slight surprise these trampolines still work with an active Control Flow Guard system policy. Trampolines do not have stack unwind entries, and I thought Windows might refuse to pass control to them.

Here’s a complete, runnable example if you’d like to try it yourself: main.c and exebuf.s

Better cases

This is more work than going through GWLP_USERDATA, and real programs have a small, fixed number of window procedures — typically one — so this isn’t the best example, but I wanted to illustrate with a real interface. Again, perhaps the best real use is a library with a weak custom allocator interface:

typedef struct {
    void *(*malloc)(size_t);   // no context pointer!
    void  (*free)(void *);     // "
} Allocator;

void *arena_malloc(size_t, Arena *);

// ...

    Allocator perm_allocator = {
        .malloc = make_trampoline(exearena, arena_malloc, perm);
        .free   = noop_free,
    };
    Allocator scratch_allocator = {
        .malloc = make_trampoline(exearena, arena_malloc, scratch);
        .free   = noop_free,
    };

Something to keep in my back pocket for the future.

联系我们 contact @ memedata.com