适用于 raven-uxn 的 x86-64 后端

适用于 raven-uxn 的 x86-64 后端
An x86-64 back end for raven-uxn

原始链接: https://www.mattkeeter.com/blog/2026-03-15-uxn/

## Uxn CPU 实现与 AI 辅助开发该项目详细介绍了为 Uxn CPU 创建 x86-64 汇编实现的過程，Uxn 是一种用于 Hundred Rabbits 生态系统中的虚构 CPU。作者之前已经创建了快速的 Rust 和 ARM64 汇编实现，并利用大型语言模型（特别是 Anthropic 的 Claude 和 Opus）将汇编代码移植到 x86-64 平台。最初，Claude 自动生成了一个可用的，但并不完美的 x86-64 实现，成本约为 29 美元。虽然需要大量人工清理——解决诸如寄存器滥用和低效指令等问题——但它提供了一个关键的起点，大大加速了开发。随后，通过模糊测试发现了一个错误，需要进一步调试，Opus 4.6 展示了令人印象深刻的调试能力，甚至识别出一个微妙的越界写入。这个过程凸显了综合测试（单元测试和模糊测试）对于 AI 辅助编码的价值。虽然作者对完全依赖 AI 生成的代码仍然持谨慎态度，但他们承认它有潜力降低开发门槛并能够处理更复杂的任务。最终实现现在已经合并并发布，展示了人类和 AI 工程之间成功但细致的合作。

这场 Hacker News 讨论围绕着 Uxn 虚拟机的一个新的 x86-64 后端，以及更广泛地，关于 LLM 和可持续计算的哲学辩论。最初的帖子引发了讨论，因为其作者的一项推荐被从与“100R”运动相关的资源列表中删除——该组织专注于计算领域的可持续性和生态正义。评论员认为 100R 认为作者对 LLM 的观点不足，尤其是在其社会和经济影响方面。这场讨论凸显了那些倡导资源密集型技术（如 LLM）的人与那些优先考虑低影响、替代计算方法的人之间的紧张关系。一些人反对对 LLM 的一概否定，认为即使在可持续框架内，它们也可能是潜在的有用工具，而另一些人则对其环境成本以及取代人类技能的潜力表示担忧。存在关于是否使用当前系统（包括 LLM）来构建更可持续的替代方案在伦理和实践上是否可行的问题。最后，一个附带的对话涉及 Uxn 机器本身，澄清它不使用传统的 ROM，而是使用加载到 RAM 中的自修改代码，这使得即时编译 (JIT) 具有挑战性。

Uxn is a fictional CPU, used as a target for various applications in the Hundred Rabbits ecosystem. It's a simple stack machine with 256 instructions:

My implementation of the Uxn CPU now has an x86-64 assembly implementation, which is about twice as fast as my Rust implementation. This required porting about 2000 lines of ARM64 assembly to x86-64, which was accomplished with the help of a robot buddy.

Let me provide a little more context.

A few years back, I wrote a Rust implementation of the CPU and peripherals, which was 10-20% faster than the reference implementation. For more background info, see that project's writeup:

The Rust implementation is fast, but suffers from the usual downsides of a bytecode-based VM: the main dispatch statement is an unpredictable branch.

I then wrote an assembly implementation of the interpreter, which proved to be about 30% faster than the Rust version. This was hard: it took several days of work, and there were lingering bugs that I didn't discover until I added a fuzz tester to check for discrepancies between the Rust and assembly implementation.

The assembly implementation is written for an ARM64 target, for two reasons:

I'm working on an ARM Macbook
Writing ARM assembly by hand is a fun intellectual exercise because the ISA is pleasantly orthogonal and well-organized, while x86 assembly is... less so

My blog post about the assembly implementation concludes with an optimistic statement:

On a brighter note, it should be relatively easy to port all of the assembly code to x86-64, but I'll leave that as a challenge for someone else!

I wrote that back in late 2024, and no one had yet risen to the challenge, so I decided to do it (kinda) myself. Because this is early 2026, you may know where this is going: the first draft was written autonomously by Claude Code.

Yes, that's right – it's finally my turn to test out the hip new coding agents on a problem that I know relatively well.

(This blog post was 100% written by me, a fleshy human, because I think that passing off AI-written text as human-authored is an insult to the reader)

How did it do?

In short, it did a great job of going from "zero to one": if I was given a blank text editor and asked to write the x86 implementations of every Uxn opcode, I would have done much worse.

The resulting implementation worked, passing both my unit tests and the fuzzer.

This was all basically autonomous: I deliberately did not help the agent with any implementation or debugging details, limiting my feedback to high-level strategy.

The assembly itself was of middling quality – and I then spent a while improving it – but the agent provided an invaluable boost of momentum to kick off the work.

The whole thing cost about $29, billed through an enterprise plan. I'm not sure how this would have gone with an unmetered plan, e.g. whether I would have hit usage limits midway through the process.

The implementation took a few hours of work, but only 15-20 minutes of hands-on time; the main speed limit was me noticing that it was waiting for approval to run a new command.

(This was all running on a disposable Oxide Computer VM, so I probably should have just run it with --dangerously-skip-permissions)

The implementation process

I started by giving the agent an overview of the problem and a description of my existing implementation:

The raven-uxn project implements a fictional CPU. There are two implementations: a safe Rust implementation, and a native code implementation. In the native implementation, we have hand-written assembly functions for each of the 256 opcodes, written with tail recursion so each instruction jumps to the next instruction. This is fast because there's no big case statement dispatching. However, the x86 implementation isn't yet working. Get it working: it should build with cargo build --features=native.

It successfully added an x86 assembly backend and got it compiling, which required a few rounds of tweaking the assembly syntax and re-running cargo build. At this point, I told the agent how to run unit tests:

Now that it's building, it should pass tests with cargo test -praven-uxn --features=native.

The agent found a bunch of failing unit tests, and fixed them mostly-autonomously. My contribution at this stage was telling it to fix one opcode at a time, because it was trying to devise a Grand Plan to fix every single failing test in one pass, and going off the rails. Because failures typically presented as SIGSEGV intead of assertion errors, I also instructed the agent to run tests one by one – otherwise, it was impossible to tell which specific opcode test caused the segfault.

Finally, once tests were passing, I told the agent how to run the fuzzer:

Great, now let's test it with fuzzing. You can run cargo +nightly fuzz run --release fuzz-native -- -max_total_time=10 -verbosity=0 to fuzz for 10 seconds and see if it finds anything.

A bunch more failures cropped up at this point, and it continued to debug them autonomously. Interestingly, at this point it started writing ad-hoc gdb scripts to probe the interpreter's internal state:

break _DEI
commands
  info registers rsp rbp
  x/4gx $rsp+0xd0
  x/4gx $rsp+0xd8
  x/4gx $rsp+0xe0
  cont
end

break raven_uxn::native::dei_entry
run --test-threads=1 dei
info registers rdi rsi
x/4gx $rsi
p *(raven_uxn::native::DeviceHandle*)$rsi

The internal thinking traces are a fascinating artifact; the thing debugs like a goldfish with logorrhea:

The comment says "orig - 8 - 48 - 0x98 = orig - 0xd0" and then "[orig+8] = [rsp + 0xd0 + 8] = [rsp + 0xd8]". But this is wrong. If "orig" = rsp BEFORE the call (i.e., rsp_caller), then: 7th arg is at [orig - 8] BEFORE the call (pushed last by caller)... no, in SysV x86_64:

Actually the comment's definition of "orig" is inconsistent. Let me figure out what "orig" means in the comment:

So "orig" is the reference point where the return address sits at [orig - 8], the 7th argument at [orig + 8], and the 8th argument at [orig + 16]. This matches the x86_64 SysV ABI where after the call instruction, the stack layout has the return address at the top and the caller's arguments positioned above it. I'm working through the stack frame layout and realizing the comment's offset claims don't align with the actual memory addresses—the math just doesn't check out. But from the debugger output, I can see the 7th argument (vm) is actually at [rsp+0xd...], which gives me the concrete data I need to move forward.

After a bunch more round-trips through the fuzzer, it stopped finding bugs and just started running the fuzzer for longer and longer periods. After 60 seconds of fuzzing with no hits, I declared success.

Why did this work?

This won't be a surprising sentiment if you've read blog posts of this nature: the agent worked well because there was a comprehensive test suite and a fuzzing harness, so it could easily close the loop.

The first implementation did not compile; once it compiled, it did not pass unit tests; once it passed unit tests, it did not pass fuzz testing. Having all of these layers of (machine-checkable) tests was necessary to get a fully working implementation.

I suspect it also worked because the problem is translation flavored: there was a full ARM64 assembly implementation, and translating from one assembly flavor to another is easier than writing it from a high-level specification (or even from the Rust code).

How was the code?

I'm not an x86 assembly expert, but even I could tell that there were a few questionable decisions. Let me give you a few examples.

Claude seemed to get caller / callee registers confused: it properly handled callee-saved registers in the function prologue and epilogue, but also insisted on saving them before doing a call to an external function. This increased stack usage and added a bunch of unnecessary instructions to each external function call:

; Save all interpreter state to the stack frame and set up args for C call
; C calling convention: arg1=rdi (VM ptr), arg2=rsi (DeviceHandle ptr)
.macro precall
    ; Write stack indices back through the pointers saved at entry
    mov rax, qword ptr [rsp + 0x30]   ; stack_index pointer
    mov byte ptr [rax], r12b
    mov rax, qword ptr [rsp + 0x38]   ; ret_index pointer
    mov byte ptr [rax], r14b

    ; Save interpreter registers
    ; [Human note: all of these are callee-saved!]
    mov qword ptr [rsp + 0x58], rbx
    mov qword ptr [rsp + 0x60], r12
    mov qword ptr [rsp + 0x68], r13
    mov qword ptr [rsp + 0x70], r14
    mov qword ptr [rsp + 0x78], r15
    mov qword ptr [rsp + 0x80], rbp

    ; Set up args: VM ptr and DeviceHandle ptr
    mov rdi, qword ptr [rsp + 0x40]
    mov rsi, qword ptr [rsp + 0x48]
.endm

It was also obsessed with using eax for everything, to its own detriment! The functions would often shuffle data into eax initially, then move it to a different register to make room for putting more data into eax. In the System-V ABI, there are nine scratch registers available, and I found that I could often tighten the code by using them:

; Claude's initial DIV2 implementation
;
; Note that it pushes to the x86 stack because
; it keeps using rax for temporary values!
_DIV2:
    movzx eax, byte ptr [rbx + r12]
    stk_pop
    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    shl ecx, 8
    or eax, ecx                       ; b (divisor, top short)

    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    movzx edx, byte ptr [rbx + r12]
    shl edx, 8
    or ecx, edx                       ; a (dividend, second short)

    ; 16-bit unsigned divide: a / b
    push rax                          ; save divisor (b) onto x86 stack [?!]
    mov eax, ecx                      ; dividend (a) in eax
    movzx eax, ax
    xor edx, edx
    pop rcx                           ; restore divisor into ecx
    movzx ecx, cx
    test cx, cx
    jz 1f
    div cx                            ; ax = a / b
    jmp 2f
1:
    xor eax, eax                      ; div by zero → 0
2:
    movzx r8d, al                     ; save result_lo
    shr eax, 8
    mov byte ptr [rbx + r12], al      ; store result_hi at current pos
    stk_push r8b                      ; push result_lo on top
    next

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Version tweaked by human
;
; This version loads the divisor / dividend directly into ecx and eax
_DIV2:
    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    movzx r9d, byte ptr [rbx + r12]
    stk_pop
    shl r9d, 8
    or ecx, r9d                        ; ecx = b (divisor)

    movzx eax, byte ptr [rbx + r12]
    stk_pop
    movzx edx, byte ptr [rbx + r12]
    shl edx, 8
    or eax, edx                        ; eax = a (dividend), already zero-extended

    ; 16-bit unsigned divide: a / b
    xor edx, edx
    test cx, cx
    jz 1f
    div cx                             ; ax = a / b
    jmp 2f
1:
    xor eax, eax                       ; div by zero → 0
2:
    mov r8b, al                        ; save result_lo
    shr eax, 8
    mov byte ptr [rbx + r12], al       ; store result_hi
    stk_push r8b
    next

Finally, Claude was hesistant to use 8 and 16-bit operations, preferring to use 32-bit operations then mask the results. This behavior is likely a legacy of translating the ARM assembly, which used the "operation then mask" pattern everywhere because the ISA does not have instructions for 8 or 16-bit wrapping arithmetic.

These idiosyncracies made a difference, squeezing another non-trivial speedup out of the test ROMs that I was benchmarking:

	Fibonacci	Mandelbrot
Rust	4.28 ms	341 ms
x86 (initial)	2.45 ms	213 ms
x86 (improved)	1.70 ms	187 ms

One caveat applies: this was using Sonnet 4.6, and it's possible that Opus 4.6 would do a better job out of the gate. I also didn't yet have a closed-loop harness for performance testing, so I couldn't just tell the AI to make it faster.

Debugging a human-introduced bug

After doing all of this human cleanup, the fuzzer found a crash:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3840651835
INFO: Loaded 1 modules   (4523 inline 8-bit counters): 4523 [0x55e78ed1fcd0, 0x55e78ed20e7b),
INFO: Loaded 1 PC tables (4523 PCs): 4523 [0x55e78ed20e80,0x55e78ed32930),
fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native: Running 1 inputs 1 time(s) each.
Running: ../foo.rom
=================================================================
==2999==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ede9fd08800 in thread T0
    #0 0x55e78ec25556  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e556) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #1 0x55e78ec7a16e  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16316e) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #2 0x55e78ec7bf41  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x164f41) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #3 0x55e78ec84ae8  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16dae8) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #4 0x55e78ec85448  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16e448) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #5 0x55e78ec8422d  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16d22d) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #6 0x55e78ec8b9d5  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1749d5) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #7 0x55e78eca66d6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x18f6d6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #8 0x55e78ecaf002  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x198002) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #9 0x55e78eccd4f6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1b64f6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #10 0x7fcea07fdd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)
    #11 0x7fcea07fde3f  (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)
    #12 0x55e78eb98a24  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x81a24) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)

0x7ede9fd08800 is located 16384 bytes after 65536-byte region [0x7ede9fcf4800,0x7ede9fd04800)
allocated by thread T0 here:
    #0 0x55e78ec259c9  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e9c9) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #1 0x55e78ec801fc  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1691fc) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #2 0x55e78ec79954  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x162954) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #3 0x55e78ec7bf41  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x164f41) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #4 0x55e78ec84ae8  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16dae8) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #5 0x55e78ec85448  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16e448) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #6 0x55e78ec8422d  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16d22d) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #7 0x55e78ec8b9d5  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1749d5) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #8 0x55e78eca66d6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x18f6d6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #9 0x55e78ecaf002  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x198002) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #10 0x55e78eccd4f6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1b64f6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #11 0x7fcea07fdd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)

SUMMARY: AddressSanitizer: bad-free (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e556) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
==2999==ABORTING

(to be clear, this was something that I introduced when refactoring)

Out of the gate, this already has "nightmare bug" vibes: the fuzzer isn't failing because the interpreter and assembly implementation diverged in behavior, but due to an AddressSanitizer failure when freeing memory! I've seen this before, and it's never fun: it means that something in the assembly implementation is stomping over unrelated memory.

This was also a case of the fuzzer getting very, very lucky: it found this bug once, but in subsequent tests (with the bug still present) ran for hours without finding it again. (While debugging, I was seriously starting to wonder if it was a bug in libfuzzer itself)

The program that triggers this crash is
SUB EQUk STZ2k ROT2 EQUr EORkr GTHkr SUB JCN2r

Its behavior isn't obvious to me, and it's surprisingly non-trivial. When the program terminates, the return stack is full of alternating 1 and 0, and the data stack has a more complex pattern of values – 120 zeros, then the following:

[0x00, 0x00, 0x00, 0x12, 0x12, 0x00, 0x00, 0xee,
 0x12, 0x12, 0x11, 0x11, 0x12, 0x11, 0x11, 0x11,
 0x11, 0x10, 0x10, 0x11, 0x10, 0x10, 0x10, 0x10,
 0x0f, 0x0f, 0x10, 0x0f, 0x0f, 0x0f, 0x0f, 0x0e,
 0x0e, 0x0f, 0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0d,
 0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c, 0x0c, 0x0d,
 0x0c, 0x0c, 0x0c, 0x0c, 0x0b, 0x0b, 0x0c, 0x0b,
 0x0b, 0x0b, 0x0b, 0x0a, 0x0a, 0x0b, 0x0a, 0x0a,
 0x0a, 0x0a, 0x09, 0x09, 0x0a, 0x09, 0x09, 0x09,
 0x09, 0x08, 0x08, 0x09, 0x08, 0x08, 0x08, 0x08,
 0x07, 0x07, 0x08, 0x07, 0x07, 0x07, 0x07, 0x06,
 0x06, 0x07, 0x06, 0x06, 0x06, 0x06, 0x05, 0x05,
 0x06, 0x05, 0x05, 0x05, 0x05, 0x04, 0x04, 0x05,
 0x04, 0x04, 0x04, 0x04, 0x03, 0x03, 0x04, 0x03,
 0x03, 0x03, 0x03, 0x02, 0x02, 0x03, 0x02, 0x02,
 0x02, 0x02, 0x01, 0x01, 0x02, 0x01, 0x01, 0x01,
 0x01, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00]

It's clearly executing some kind of looping algorithm before it terminates.

To make matters worse, the program also runs fine using the raven-cli executable; the fuzzer gets (un)lucky that the program stomps on RAM that is monitored by AddressSanitizer.

I sicced Claude (Sonnet 4.6) on this, but it mostly spun its wheels; it's hard to tell whether that's because it wasn't making forward progress, or whether Anthropic's servers were particularly overloaded that day.

Eventually, I tracked it down myself: it was an out-of-bounds write of 0 in the STR instruction, which wrote to an address before the start of the VM's RAM. The correct location in the VM's RAM was already 0, so the interpreter and assembly implementation didn't diverge.

There was also a second reason the bug was so hard: as you may notice, STR wasn't in the program! The bytecode program writes data to RAM, then jumps to that address; the VM then treats that data as further bytecode.

Quick aside: The easy way to debug this issue

I spent a while doing printf debugging, which was not the best way to do it;
as it turns out, Valgrind finds the out-of-bounds write, even when running the (seemingly-fine) raven-cli:

==61880== Invalid write of size 1
==61880==    at 0x41D4E4B: ??? (in /home/ubuntu/raven/target/release/raven-cli)
==61880==  Address 0x4cefbf0 is 16 bytes before a block of size 65,536 alloc'd
==61880==    at 0x4A8C36C: calloc (vg_replace_malloc.c:1678)
==61880==    by 0x41D8866: alloc_zeroed (alloc.rs:178)
==61880==    by 0x41D8866: alloc_impl_runtime (alloc.rs:190)
==61880==    by 0x41D8866: alloc_impl (alloc.rs:312)
==61880==    by 0x41D8866: allocate_zeroed (alloc.rs:435)
             # etc, etc

However, the address space is nonsense (0x41D4E4B).

It's then possible to combine it with GDB, by starting Valgrind with

$ valgrind --vgdb=yes --vgdb-error=0 ./target/release/raven-cli --native ../foo.rom

Then, from within GDB:

(gdb) target remote | /snap/valgrind/181/usr/libexec/valgrind/../../bin/vgdb
Remote debugging using | /snap/valgrind/181/usr/libexec/valgrind/../../bin/vgdb
relaying data between gdb and process 61880
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/8c/fa19934886748ff4603da8aa8fdb0c2402b8cf.debug...
0x000000000425c290 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000041d4e4b in _STR ()
(gdb) disas
Dump of assembler code for function _STR:
   0x00000000041d4e39 <+0>:     movsbq (%rbx,%r12,1),%rax
   0x00000000041d4e3e <+5>:     dec    %r12b
   0x00000000041d4e41 <+8>:     mov    (%rbx,%r12,1),%cl
   0x00000000041d4e45 <+12>:    dec    %r12b
   0x00000000041d4e48 <+15>:    add    %rbp,%rax
=> 0x00000000041d4e4b <+18>:    mov    %cl,(%r15,%rax,1)
   0x00000000041d4e4f <+22>:    movzbl (%r15,%rbp,1),%eax
   0x00000000041d4e54 <+27>:    inc    %bp
   0x00000000041d4e57 <+30>:    lea    0x651a2(%rip),%rcx        # 0x423a000
   0x00000000041d4e5e <+37>:    jmp    *(%rcx,%rax,8)
End of assembler dump.

This is dead on, and would have saved me a few hours of frustration!

(For what it's worth, I'm comfortable with both Valgrind and GDB, but didn't know how to combine them; Claude Web helpfully provided the right commands)

Second aside: Hitting it with a bigger model

Out of curiosity, I reintroduced the bug into the codebase and threw Opus 4.6 (1M context window) at it, with --dangerously-skip-permissions:

raven-uxn implements several interpreters for the Uxn virtual machine. There's one implementation in Rust, but the interesting ones are in raw assembly. Writing raw assembly improves performance because we can write threaded code, where each instruction jumps directly to the next (Rust can't do this because it lacks guaranteed tail recursion). Anyways, I've been having a rare issue with the x86 assembly backend: one particular program sequence fails in fuzzing. The program sequence is the following opcodes: SUB EQUk STZ2k ROT2 EQUr EORkr GTHkr SUB JCN2r. When run in the fuzzer, this triggers an AddressSanitizer error. Interestingly, it does not trigger a check for discrepencies between the interpreter and assembly implementations, so it's producing the correct behavior (or incorrect behavior that doesn't change the end state of the VM). You can reproduce this with cargo +nightly fuzz run --release fuzz-native foo.rom. Your mission is to track down whatever is causing this issue.

I then went upstairs to make myself a cup of tea.

When I came back downstairs (after about 10 minutes), it had not solved the problem; indeed, it took a whole 18 minutes to figure it out. Along the way, it fixed five other instances of the incorrect pattern, which I hadn't noticed.

All of this — and subsequent semi-automated cleanups — cost another $25.

Opus did okay but not great at automated cleanup ("find all cases where we do a 32-bit load but only use the lowest 8 bits, and replace them with 8-bit loads"). It would often declare that it had fixed everything, only for me to find more examples of the undesirable pattern.

Is `raven-uxn` slop now?

I don't know, you tell me – ideally on social media, with personal insults and imprecations about my character!

Back in 2024, if someone had taken me up on my suggestion to write an x86 backend and had opened a PR with the same code that Claude delivered, I would have given it a similar amount of review / editing before merging it in.

(Honestly, I would have made fewer changes to a human PR, because I'm sensitive to completely ripping up someone's work; the diff from the original agent's implementation is substantial)

Is that implementation irrevocably tainted by its source, even after my edits?

A second perspective: if someone in 2026 had opened a PR with this same code and told me that Claude wrote it, I probably wouldn't have merged it – I don't trust strangers to apply the same level of engineering rigor when using LLMs.

Finally, this wouldn't have gotten done without Claude Code: I've got too much else to do, and the activation energy was too high. Is lowering energy barriers worth polluting the cognitive ecosystem with out-of-distribution entities?

What's next?

The PR is now merged, and a new 0.2.0 release is on the way.

This experience hasn't made me a vibe-coding maximalist; I find that the act of writing code myself is necessary to build the mental model of a complex system, and concerns about cognitive debt ring true to me.

However, I was impressed by Opus 4.6's ability to debug the subtle assembly bug, and will consider reaching for it in the future. There's an old aphorism that debugging requires being as twice as clever as writing the code initially, so if you write code that's at your cleverness limit, you won't be able to debug it; if LLMs help with debugging, it frees me up to write more clever code!

(and I wish people would stop arguing that these tools don't work)