我如何停止担忧,开始热爱汇编语言。
How I stopped worrying and started loving the Assembly

原始链接: https://medium.com/@jonas.eschenburg/how-i-stopped-worrying-and-started-loving-the-assembly-4fd00e786c60

## Atari ST 上的复古计算与体素空间:总结 Jonas Eschenburg 是一位软件开发者,他对计算机的热情源于 1986 年的 Atari ST。他详细描述了重返复古编程的历程。受到在大型项目中失去创造性控制的驱动,他着手一项具有挑战性的目标:在 Atari ST 的有限硬件上重现 PC 游戏《科曼奇:最大杀伤力》中的体素空间地形渲染。 Atari ST 是一款 16 位机器,配备 1MB 内存和摩托罗拉 68000 CPU,它比同时代的 PC 提供了更简单的编程环境,具有直接的硬件访问和相对优雅的汇编语言。 Eschenburg 利用现代工具——VSCode、GCC 交叉编译器和图像处理软件——来克服原始开发过程的限制。 该项目需要大量的优化,超越了算法改进,达到了精细的汇编级别调整。他专注于最大化紧密循环内的效率,利用简化算术、使用查找表以及利用 68000 的 32 位寄存器进行伪 SIMD 操作等技术。 最终,Eschenburg 实现了交互式性能,证明了凭借独创性和现代工具,即使是雄心勃勃的项目也可以在经典硬件上实现,重燃早期计算的魔力。

## 重拾汇编编程的乐趣 最近的Hacker News讨论强调了一个蓬勃发展的复古计算场景以及对汇编语言编程的 renewed 兴趣。用户分享了他们使用 Oric-1、ZX Spectrum、Commodore 64 和 Atari ST 等经典机器的经验,并指出这些“古老”平台仍然有惊人的新开发活动。 对话涉及了理解汇编的好处——欣赏现代软件的臃肿,更深入地了解计算机的工作原理,以及低级编码带来的禅意般的满足感。Ben Eater 的 6502 系列和 68000 汇编书籍被推荐给初学者。 虽然许多人承认汇编的复杂性,但一些评论员强调汇编本身并不 *困难*,只是不如现代语言方便。有些人甚至觉得它比浏览复杂的现代 Web 技术栈更容易。最终,这场讨论描绘了一个充满活力的爱好者社区在计算的基础知识中找到乐趣和智力刺激的景象。
相关文章

原文
17 min read

Oct 17, 2025

Or why robots are playing DOOM now

Who am I

Hello, my name is Jonas Eschenburg. I feel like my path was pre-determined ever since my dad brought home this wonderful computer back in 1986. The computer was an Atari ST, originally intended for serious tasks like writing letters and articles. I soon found out I could also have a lot of fun with it. I was six years old back then and what started with playing games like Megaroids ended up with me wanting to learn programming in order to create my own games. By the age of ten I was creating simple games and demos in GfA BASIC.

Later in my life I studied computer science and pursued a career as a software developer. At KUKA, I help create system software for industrial robots. As part of a large company it isn’t always easy to keep the spirit as my influence on the products is obviously limited. At some point, I missed the feeling of actually having the world at my fingertips. It was time to go back to the roots.

What this is about

This is an experience report of a voyage into the land of retro programming. If I’m doing it right, I will instill a certain sensation in you, the fascination that kept me going and that made me write this article. An article about how to write software for the Atari ST, a computer system launched in 1985.

The Atari ST was part of the second major wave of home computers. The first wave, based on 8 bit CPUs, brought icons such as the Commodore C64 and the Sinclair Spectrum into the living rooms. The second wave was based on 16 bit CPUs and gave birth to icons such as the Apple Macintosh, the Commodore Amiga and, of course, the Atari ST.

Press enter or click to view image in full size
Atari ST 1040 STF with color monitor Atari SC1224 as released in 1985.
By © Bill Bertram, 2006 — Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=500910

A typical Atari ST (like the 1040 STFM) would run on the Motorola 68000 CPU clocked at 8 MHz. It would sport 1 MB of RAM and load software from 3½ inch floppy disks. Its graphics capabilities would comprise a monochrome “high resolution” mode with 640x400 pixels on screen and a 16 color mode in 320x200 that turned out most popular for games.

The Atari ST’s GEM desktop featured an interface very similar to what Xerox and Apple had pioneered, with a file manager, desktop icons, drop-down windows and overlapping windows. Here the monochrome high-resolution version.

If you don’t happen to own an actual Atari ST, you can simply emulate one on your modern computer. In fact, emulators have been around since the 90s. A good one is Hatari which is available for most modern platforms. Using an emulator also gives you access to modern development tools. Whereas back in the days you had to use clunky development software loaded from floppy, nowadays you can just do everything in VSCode and use a modern GCC compiler. You have access to modern image manipulation tools like GIMP and you can even ask the AI of your choice for help.

Hello World

Did I convince you? So let’s take our first steps in the shiny world of retro computing. Let’s open a text editor and write the famous Hello World (hello.c):

#include <stdio.h>

int main() {
printf("Hello, world!\n");
return 0;
}

In order to compile, we need a C compiler. Luckily, there has been an ongoing effort to maintain an up-to-date GNU toolchain. You can download precompiled binaries for all major platforms (including the Atari ST itself, but that’s not what I’d recommend) on Thorsten Otto’s Crossmint site. You need GCC and binutils to compile hello.c:

m68k-atari-mint-gcc hello.c -o hello.prg

And there you have it: the iconic Hello World compiled for a 16 bit CPU.

Press enter or click to view image in full size
HELLO.PRG running on the Atari ST. Due to the program terminating right after startup, the message is only visible for a fraction of a second.

What makes programming for the Atari ST so special?

One of the things I appreciate most about programming for the Atari ST is its simplicity. The Motorola 68000 CPU, often labeled as a 16-bit processor, actually offers much more. It features a set of sixteen 32-bit registers and supports flat 32-bit addressing and arithmetic operations, albeit with slightly reduced performance compared to 16-bit operations. This stands in stark contrast to the PC platform of the time, which was limited by the Intel 8086’s small set of 16-bit registers and a segmented memory model that made programming far more complex. In comparison, the M68K assembly language is quite elegant and user-friendly.

The system lacks virtual memory and offers no memory protection. Most of the hardware is accessible through memory-mapped I/O, allowing any program to directly interact with the registers of hardware components. This direct access becomes particularly valuable when working with graphics or sound. Additionally, there is no dedicated video memory. While an API exists for hardware-independent graphics operations, developers can also write directly to video memory if they choose.

Another notable aspect is the near absence of a strict separation between user space and kernel space. For instance, it’s possible to install interrupt service routines directly from within a user-space program. This level of control and openness makes the ST a uniquely accessible and rewarding platform for low-level programming.

All that said, what really distinguishes programming for such a system from modern software development is of course not the system itself, but the radically reduced scope of viable projects. No web-based frontend, no HTTP, no client-server interaction, no JSON, no multithreaded async I/O backend. Just a computer and all its hardware, for the programmer to abuse.

A VoxelSpace adventure

My ambitions started with a game I played on my friend’s PC for the first time most likely around 1993. Comanche: Maximum Overkill was an action-filled military helicopter flight simulator that earned its place in history mostly for one aspect. Whereas previous flight simulators rendered the landscape and all surroundings in a very abstract way using flat-shaded polygons, Comanche introduced visuals unseen so far. Suddenly there were rolling hills, narrow canyons and steep mountains and meandering rivers to smoothly fly over. All due to an algorithm called VoxelSpace.

Press enter or click to view image in full size
Screenshot of Comanche: Maximum Overkill

Comanche quickly earned a reputation as a system seller for high-end PCs (which at the time most likely featured an Intel 486 processor and between 4 and 8 MB of RAM). The time of humble systems like the Atari ST or its more advanced contender, the Commodore Amiga, was over and would never come back. Comanche would never run on these machines and nothing would ever come close.

One day in 2024 I had the crazy idea to make VoxelSpace run on the Atari ST. That was a terrible idea and those are sometimes the best.

I found terrain and texture data of the original Comanche in an easily usable format (Sebastian Macke’s excellent GitHub page). The data still needed a bit of fiddling in order to be usable. I created my own tooling based on the GIMP image manipulation program and its LISP-based scripting language “Script-Fu”. Using the Crossmint GCC compiler I created a hacky little demo program for the ST that loaded the terrain data into memory and brought it to the screen. This is what it looked like:

Picture of first voxel-st version

Of course it was slow but it did look nice. I even posted a screen capture on Twitter and received very positive responses. I kept iterating and optimizing and finally ended up with something that ran at interactive speeds. Had it been a game released in 1990 it would have blown a lot of minds.

A later version of voxel-st with a minimap and a lighted instrument panel.

Graphics on the Atari ST

But how did I actually get there? How do you even draw to the screen on an Atari ST? Turns out you just have to write data into the part of RAM that is used as video RAM. Easy, right? In fact, it appears easy, but as with many things in retro computing, there’s a twist. Time for some 16-bit computing folklore (technical content ahead!).

Remember the Atari ST has a resolution of 320x200 pixels in 16 colors? In order to distinguish 16 colors, you need 4 bits, right? This is a little bit unfortunate as the smallest addressable unit in a computer is the byte, which is famously composed of 8 bits. So there are two sane ways to map pixels to bytes:

  1. Use only the lower four bits of every byte and leave the upper four bits unused (therefore wasting four bits of information per pixel but keeping a nice 1:1 pixel-to-byte mapping)
  2. Stuff two pixels into one byte. This doesn’t waste any memory but comes at the expense of generating an additional overhead when drawing single pixels.

Unfortunately, the hardware designers at Atari went for option 3, which is an utterly insane choice: they used interleaved bit-planes.

Press enter or click to view image in full size
The Atari ST’s video memory is laid out in an interleaved bitplane arrangement.

To understand a bit-plane, you decompose pixels into their bits and group the corresponding bits of neighboring pixels together. For example, take bit 0 from each of pixel 0, pixel 1, pixel 2, and so on. There are four bits per pixel, hence there are four distinct maps of bits (this is actually where the term “bitmap” comes from). Each map represents one bit position across all pixels. It helps to visualize these maps layered on top of each other, hence the term “bit-planes”.

To make matters more complicated, Atari’s designers came up with the idea of interleaving the bit-planes, making the graphics hardware simpler to implement. The interleaving works as following: Every group of 16 horizontally neighboring pixels are encoded into 8 bytes in the following way:

  • Bytes 0 and 1 contain the 16 bits of bitplane 0
  • Bytes 2 and 3 contain the 16 bits of bitplane 1
  • Bytes 4 and 5 contain the 16 bits of bitplane 2
  • Bytes 6 and 7 contain the 16 bits of bitplane 3

So every 8 bytes encode a span of 16 pixels. A screen line is 320 pixels wide so it takes 160 bytes to encode. There are 200 screen lines so a full screen occupies exactly 160*200 = 32000 bytes. Why group 16 pixels and not 8? That’s because the ST is a 16-bit machine and has a data bus that is 16 bits wide, which results in a general preference of 16-bit words over 8-bit bytes. However, this makes programming graphics on this platform a bit of a challenge.

Enter the assembly

When programming for a retro system such as the Atari ST with its (ridiculously slow by today’s standards) 8MHz processor, efficiency is immensely important. While the GNU C Compiler outputs reasonably optimized code, it is very beneficial to keep a close eye on the actual assembly generated. I already knew some 68000 assembly when I started the VoxelSpace project, but I learned a lot more by analyzing what the compiler generated and refining the original C code to guide the compiler toward better output. Reading assembly code was difficult at first but, as with all things, it became easier with time.

When you study Computer Science, you are made aware of the dangers of premature optimization. You learn to focus on algorithmic improvement first, yielding so-called asymptotic complexity, expressed in the famous big-O notation. An algorithm that is said to have O(N) is better than one that has O(N²). However, the VoxelSpace algorithm is pretty much resistant to asymptotic analysis. It’s already bounded by O(N), where N is the number of samples to take from the terrain in each frame, and that’s a fixed number you can choose to suit your needs.

Now is the time for assembly-level optimization. There are basically two ways to do that:

  • By inspecting the assembly code generated by the compiler. When compiling with option “-fverbose-asm”, you will get assembly code with a comment per line that explains which C variables a particular register maps to.
  • By embedding assembly code into your C code. This is most easily done using GCC’s inline assembly feature.
Press enter or click to view image in full size
Using GCC’s inline assembly syntax can be challenging but unlocks the full potential of your target CPU.

During the course of a couple of weeks I managed to get at least an order of magnitude out of my VoxelSpace demo. Through experimentation and research I learned many optimization tricks. Here are just a few:

  • Concentrate on your inner loops. Profile as good as you can. Analyze the generated assembly code. Don’t just trust the compiler!
  • Simplify arithmetics. Avoid multiplication and division at all costs.
  • Use look-up tables. Memory access is relatively fast on old CPUs and caches were only introduced in later CPUs (that’s very different today).
  • Avoid bit-shift operations. The original Motorola 68000 used an unoptimized version where shifting by larger amounts was actually more expensive than shifting by just a few positions. Shift-left by one or two bits can be replaced with addition.
  • Pre-shift data that otherwise needs to be bit-shifted in tight loops.
  • Economize on registers. Using fewer registers avoids having to spill registers to the stack.
  • Use poor-man’s SIMD: Use 32-bit registers to perform two 16-bit additions at once. This comes in very handy if you do any kind of vector arithmetic and can live with the reduced precision.
  • Use 16-bit integers. While the Motorola 68000 CPU is perfectly able to perform arithmetics on 32-bit registers, working with the 16-bit variants of operations such as “add”, “asl” or “eor” can yield a few cycles in ultra-tight loops.

Modern Tooling

One might be inclined to think that programming for a retro system is a backward experience, using outdated tools and having a high barrier of entry. Thanks to the efforts of many great people, this is not at all the case. In fact, some of the possibilities are more advanced than what software developers on current systems experience daily.

It starts with the Integrated Development Environment. The recommendation is to use Visual Studio Code and Microsoft’s C/C++ extension which can be easily configured to use a cross-compiler.

联系我们 contact @ memedata.com