我用F#构建了一个Game Boy模拟器。
I built a Game Boy emulator in F#

原始链接: https://nickkossolapov.github.io/fame-boy/building-a-game-boy-emulator-in-fsharp/

## Fame Boy:用 F# 构建一个 Game Boy 模拟器 出于理解计算机*实际*工作方式的愿望,Nick Kossolapov 启动了用 F# 构建 Game Boy 模拟器“Fame Boy”的项目。他从像“从 NAND 到 Tetris”这样的课程和 CHIP-8 模拟器 (Fip-8) 开始学习,以掌握核心计算机概念。 Fame Boy 历经数月完成,可在桌面和网络上运行,利用简单的接口——帧缓冲、音频缓冲、`stepEmulator()` 和 `getJoypadState()`——将核心模拟与前端分离。该模拟器紧密模拟了原始 Game Boy 硬件,为 CPU 建立了一个功能性领域模型,并使用强大的 `IoController` 来管理硬件交互。 性能优化至关重要,导致在惯用的 F# 和实际速度之间做出了权衡。为了内存访问,采用了可变性,并且性能分析表明,直接数组访问比领域驱动抽象带来了显著的收益。AI 工具帮助进行代码审查、错误查找(包括一个关键的定时器问题),甚至性能改进。 最终,Fame Boy 是一次成功的学习经历,加深了 Kossolapov 对计算机体系结构的理解,并提供了一个由他对童年 Game Boy 冒险的回忆所驱动的令人满意的项目。他分享了关于开发过程的详细见解,包括性能基准测试以及函数式纯度和实际实现之间的平衡。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交 登录 我用 F# 构建了一个 Game Boy 模拟器 (nickkossolapov.github.io) 24 分,elvis70 发表于 46 分钟前 | 隐藏 | 过去 | 收藏 | 2 条评论 帮助 z500 2 分钟前 | 下一个 [–] 太酷了!我喜欢 F#,但我用它写了一个小型的 Smalltalk 解释器,我可以确认它在这种事情上并不是很快,如果你打算按预期使用它的话,哈哈。回复 cermicelli 16 分钟前 | 上一个 | 下一个 [–] 终于有人真正投入精力去学习东西了,而不是 LLM 帮助我用 Y 在 X 分钟内构建了什么。我想人类还有希望。回复 hmokiguess 14 分钟前 | 上一个 [–] F# 非常有趣,很棒的工作!回复 考虑申请 YC 2026 年夏季项目!申请截止日期为 5 月 4 日。 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:
相关文章

原文
I built a Game Boy emulator in F#

Nick Kossolapov · April 2026

I’ve been working as a software engineer for over 8 years at this point, and admittedly I’ve never understood how computers actually work. So I figured I’d try to learn how they work by emulating one. Sorry Ben Eater, I’m not going to build one just yet.

I spent hundreds of hours as a kid catching Pokémon, so the Game Boy was the perfect candidate: real hardware, relatively simple in scope, and something with a strong personal connection.

Instead of jumping straight into it, I first did From NAND to Tetris. It was a great course, and it made me really understand the fundamentals of computers, like registers, memory, and the ALU. Then to get used to building an emulator, I built a CHIP-8 emulator in F#: Fip-8.

A few months later, and after many nights of going to bed at 2 AM even though I told myself I’d only work on it for an hour or two, I have a working Game Boy emulator: Fame Boy. Complete with sound and runs on desktop and web.

Fame Boy playing Pokémon Blue

I wanted to have the emulator work on both desktop and web, so I focused on having a simple interface between the emulator core and whatever frontend is running it.

The interface between the frontends and core is essentially just two arrays and two functions:

  • framebuffer - a 160x144 array of shades (white, light, dark, black).
  • audiobuffer - a ring audio buffer at sample rate of 32768 Hz with read and write heads.
  • stepEmulator() - a function that executes one CPU instruction and returns the number of cycles taken.
  • getJoypadState(state) - a callback for the frontend to give the emulator the joypad state, usually once a frame.

Fame Boy architecture diagram

I tried to model Fame Boy in a similar way to the actual hardware of the Game Boy.

The CPU, like the real Sharp LR35902 in the Game Boy, knows nothing about the hardware except a memory map (and the IoController for interrupt signals only). It’s also the most F#-ish part of my codebase, leaning heavily into functional domain modelling.

Memory.fs holds most of the RAM used in the Game Boy, and acts as the memory map and bus between the CPU, IO Controller, and cartridge. It also shares a reference to the same VRAM and OAM RAM arrays with the PPU for performance.

IoController.fs emerged when I found myself adding too much logic to Memory.fs. While a singular IO controller doesn’t exist in the Game Boy hardware, handling all the hardware registers through it simplified and added safety to the interfaces for the individual components.

The stepper function in Emulator.fs is the glue that brings the whole emulator together, composing all the components’ individual step functions:

my CHIP-8 emulator is completely pure (no mutable members and all arrays are copied for none of that side-effect nonsense), Fame Boy uses mutability liberally. The Game Boy runs a lot faster than the CHIP-8, and copying 16+ kB of memory a million times every second didn’t seem like the smart thing to do.

So, why F# for Fame Boy? Firstly, I think its extensive typing system works really well for modelling CPU instructions. Secondly, and more importantly, I just really like F#. I used to work primarily in F# at my previous company, and so I’m always looking for an excuse to keep on using it.

As an example why I think the CPU modelling works well in F#, I was following Gekkio’s Complete Technical Reference when implementing the CPU. I grouped the instructions like the reference, and ended up with something like this in Instructions.fs:

CAMLBOY, I spotted lines like this:

Cpu/State.fs L81):

the joypad in Pandocs for those interested.

After I had finished and had a working emulator, I fleshed out the repo’s readme and was preparing to write this. But I was playing around with the web version and thought it felt a bit empty without the sound. And so I went ahead to try and add the APU, audio processing unit (first mistake). I read a few blogs, and found that many emulators use the frontend audio sampling rate to drive the emulator, rather than the framerate. This sounded backwards to me, so I researched dynamic sampling rates and decided to use that with the framerate driving the emulator instead (second mistake).

Sound was the most challenging component for me conceptually. It took me a while to understand how the different sound registers and channels work. This is where AI as a teacher really shone. I had a lot of back and forth before trying to start coding. But much like the PPU, it was really satisfying as I completed each channel. And by doing one channel at a time, it helped me understand how music is actually formed. Hearing the Tetris song slowly come together and build up richness actually brought a smile to my face.

It wasn’t all sunshine and rainbows though. The CPU and PPU are basically just “once per frame, do exactly X number of things,”, and you can calculate X easily. And instead of just calculations, the APU on the other hand has so many things to choose and tune.

The only easy thing to choose was the APU’s sampling rate. The actual Game Boy’s APU was flexible, so emulators can use whatever sampling rate they want. I went with 32768 Hz, because that comes to 1 sample every 128 CPU cycles (1 048 576 Hz, and 1 048 576 / 32 768 = 128). So my APU state can use integers and still stay perfectly in sync. 128 is also divisible by 4, so I could batch the APU steps 4 at a time and never miss alignment with a CPU instruction.

All the other things though, much more wonky. I’m not a sound engineer, so I was just changing values hoping for the best. And worst of all, not only did each frontend have its own quirks, but each platform did too. I got the sound working well on PC, but when I tried it out on a MacBook it sounded like a waterfall. I fixed the MacBook, and suddenly the desktop version on PC doesn’t even run because of a race condition.

After hours of tweaking settings and failing, I abandoned trying to be clever with the dynamic sampling rates, and moved to having audio drive the emulator. That made audio much more reliable between all the devices.

It’s also definitely the leakiest part of the emulator-frontend interface, but that’s because audio does need to be precisely synced to avoid a cacophony.

To explain the difference between audio-driven and frame-driven, it’s more about understanding human perception.

You know when you’re listening to something, and it pops? There was a drop in the audio signal, so the speaker suddenly moves a lot due to the sudden change in the signal, creating a pop. The same thing happens when a video stutters, there wasn’t enough data coming in on time and so the video player has to skip a frame or two. Only now it’s not pushing something physical, so it’s less offensive to our senses.

Now back to driving the emulator. In Fame Boy, both audio and video are perfectly synced, because that’s how it’s designed. But the computer running it has independent audio and video, and either may occasionally fall behind. So when the frontend’s audio and video are out of sync, it has one of two options to try:

  1. Keep the frontend audio and emulator audio in sync, and occasionally drop frames
  2. Keep the frontend video and emulator frames in sync, and occasionally drop audio

So the one you choose “drives” the emulator, while still trying to keep the other one close. Driving with the frame rate is fairly straightforward. Here’s a simplified version:

?frame-driven as a query parameter in the URL. It should be visually smoother, but there will be the occasional audio pop. Also, even the audio-driven web frontend switches to being frame-driven when the mute button is pressed, as those pops won’t be audible anyway.

My implementation of this is far from perfect, though. Ultimately, I found the audio pops leave a worse impression than frame stutters, and leaving the emulator muted made it feel empty, and so I decided to make audio-driven the default in the web frontend. Audio is one of the few areas of Fame Boy I’m not quite happy with, and would like to revisit someday.

After I had gotten the PPU somewhat working and could see some things happening on the screen on the desktop, I was excited to try to move Fame Boy to the web. I hopped onto the Fable docs, installed a package, set up the main loop, added some styling, and within an hour or two I was ready to try and run it. I hit enter, and then:

Tetris, allegedly

Maybe this version of Tetris is set in winter in Siberia.

I tried debugging the issue for a bit, but instead of spending too much time on it I just moved on to trying WebAssembly with Blazor. It was also similarly easy to get up and running, and this time it actually worked.

But there was a problem, it was nigh unplayable, getting maybe 8 FPS. I’m still not sure what the issue is. I don’t think it was Blazor itself, the .NET team did publish performance guides that I tried to follow, but they didn’t help in the end. Debugging was also a pain, so I reluctantly went back to Fable to look into what could be going wrong with the transpilation into JavaScript.

To my surprise, Fable puts the transpiled JS files right next to the source code, and it’s actually quite readable.

Fame Boy's transpiled code

This made understanding the new code, and also debugging in the browser dev tools, quite straightforward. And when looking at the dev tools I noticed something was a bit wrong.

The CPU state in the browser's dev tools

The CPU registers in Fame Boy (and the Game Boy too) are 8-bit unsigned integers, so in the range 0-255. I’m not an expert, but I don’t think -15565461 is an 8-bit number. I went through the transpiled code and the Fable docs and found this:

(non-standard) Bitwise operations for 16 bit and 8 bit integers use the underlying JavaScript 32 bit bitwise semantics. Results are not truncated as expected, and shift operands are not masked to fit the data type.

That would explain the rampant B register. Hunting through the code to find all the places where 8-bit values needed to be truncated, I managed to find all the offenders, and voilà, the frontend worked perfectly. And since it’s only JS without the .NET runtime, the web bundle is only around 100 kB.

Outside of the weird uint8 issue (which most people shouldn’t have), I had a fairly pleasant experience with Fable. It was rather smooth, and it meant that all of the source code stays in F#.

Once I had gotten things showing on the screen, I was curious what performance was like. I added a simple FPS console log. At that point it was around 55-60 FPS in debug mode. Not great, not terrible. I think that was due to Raylib trying to maintain v-sync though. When I turned v-sync off the emulator jumped to around 70 FPS, but was jittery. I can always optimise later, and so I decided to power on with the PPU.

As I added more features, performance slowly dropped, eventually reaching 45 FPS, and turning off v-sync didn’t help. Later had arrived, so it was time to optimise. I fired up the profiler in JetBrains Rider, and saw this:

Spot the big number in the profiler

mapAddress looked very suspect. Virtually every component accesses the memory, but why is it so much higher than the rest? I drilled down into the other function calls (like stepPpu), and found everything was spending more time than I expected on accessing memory. So I went to the offending code:

one change doubled my FPS.

Later benchmarking showed the bulk of the improvements seems to have been JIT optimisations around branching and localised call sites, as making MemoryRegion a struct DU (so allocated on the stack) only improved performance by about 15%, with the other 85% coming from the removal of the DU and map function.

There were more times where I moved to struct DUs or went with other not-as-F#-friendly approaches. The PPU was the point where optimization became necessary, and I had to abandon idiomatic F# to an extent.

I slowly improved performance by spending time regularly looking at the profiler, eventually getting it up to about 120 FPS. But then I found the biggest improvement in FPS. The fix? Turning off the debug build, taking the emulator to a lightning-fast 1000 FPS. It took me embarrassingly long to realise that debug mode is that much worse than release mode. I continued to regularly monitor performance and tweak things even up until the end.

Just looking at FPS numbers in the console didn’t seem like a great way to measure performance. So partway through the project I added a BenchmarkDotNet project to measure desktop performance, and later made a simple web benchmarker using Node.js to perform similar benchmarks to estimate web browser performance.

The benchmarks all used the following demo ROMs, used to test realistic scenarios:

  • Flag - a short loop with no sound.
  • Roboto - a long-running (>1 min) demo that uses many visual effects and has sound.
  • Merken - similar to Roboto, but uses a memory banked ROM to test the memory.

Here is desktop FPS performance on both a Ryzen 9 7900 Windows PC and an M4 MacBook Air.

CPU Flag Roboto Merken
Ryzen 9 7900 1785 1943 1422
Apple M4 1907 2508 1700

And here is the web FPS performance.

CPU Flag Roboto Merken
Ryzen 9 7900 646 883 892
Apple M4 779 976 972

Fame Boy performs decently well on both platforms.

Surprisingly, the APU (sound) had more of an impact on emulator performance than the PPU. Disabling the PPU increases desktop performance by about 250 FPS, while disabling the APU increases it by around 500 FPS.

No code is free from the influence of AI these days, even learning projects. In general I strive to be transparent about my AI usage, and so I wanted to comment on how I used it and my experience with it in a purely learning project.

Throughout the process I tried to mostly use AI as an aid. I regularly asked it for code reviews, as a wall to bounce ideas off of, and to explain any terse technical documents. I tried to minimise the use of AI-written code though. I wanted to make something that I can show to people and be proud of. Code, for humans, by a human. If I wanted nothing more than an emulator I could’ve just shared the prompt.

There were two noteworthy cases with AI in my project. One was at the end where I decided to just burn some tokens, and unleashed the CLI on my repo to try to find performance improvements. I gave it some ideas, and asked it to try anything else it wanted to. It did really well, more than doubling the performance for some of the benchmarks (PR link with details). It did introduce some bugs that I found and had to fix though. Notably, one of the bigger performance wins (only updating STAT on mode/LY transitions) broke some games and demos that relied on more frequent updates (fix commit).

The other was a case where AI actually saved this project when I had nearly given up. If you look through the git history in my repo, you’ll find a rather large gap at one point. I call it the “timer winter”.

The timer winter

It wasn’t that I didn’t work on the emulator, I was just stuck on a bug. I could never get past the copyright screen in Tetris, no matter what I tried. I probably spent over 20 hours debugging, scanning the emu-dev Discord, creating tests, and even throwing the issue at earlier AI models. Nothing worked. But then after a few weeks away from the emulator I tried Claude Opus, and it found the issue in just a few minutes. The fix?

email me if you have any questions or comments.

联系我们 contact @ memedata.com