Or why robots are playing DOOM now
Who am I
Hello, my name is Jonas Eschenburg. I feel like my path was pre-determined ever since my dad brought home this wonderful computer back in 1986. The computer was an Atari ST, originally intended for serious tasks like writing letters and articles. I soon found out I could also have a lot of fun with it. I was six years old back then and what started with playing games like Megaroids ended up with me wanting to learn programming in order to create my own games. By the age of ten I was creating simple games and demos in GfA BASIC.
Later in my life I studied computer science and pursued a career as a software developer. At KUKA, I help create system software for industrial robots. As part of a large company it isn’t always easy to keep the spirit as my influence on the products is obviously limited. At some point, I missed the feeling of actually having the world at my fingertips. It was time to go back to the roots.
What this is about
This is an experience report of a voyage into the land of retro programming. If I’m doing it right, I will instill a certain sensation in you, the fascination that kept me going and that made me write this article. An article about how to write software for the Atari ST, a computer system launched in 1985.
The Atari ST was part of the second major wave of home computers. The first wave, based on 8 bit CPUs, brought icons such as the Commodore C64 and the Sinclair Spectrum into the living rooms. The second wave was based on 16 bit CPUs and gave birth to icons such as the Apple Macintosh, the Commodore Amiga and, of course, the Atari ST.
By © Bill Bertram, 2006 — Own work, CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=500910
A typical Atari ST (like the 1040 STFM) would run on the Motorola 68000 CPU clocked at 8 MHz. It would sport 1 MB of RAM and load software from 3½ inch floppy disks. Its graphics capabilities would comprise a monochrome “high resolution” mode with 640x400 pixels on screen and a 16 color mode in 320x200 that turned out most popular for games.
If you don’t happen to own an actual Atari ST, you can simply emulate one on your modern computer. In fact, emulators have been around since the 90s. A good one is Hatari which is available for most modern platforms. Using an emulator also gives you access to modern development tools. Whereas back in the days you had to use clunky development software loaded from floppy, nowadays you can just do everything in VSCode and use a modern GCC compiler. You have access to modern image manipulation tools like GIMP and you can even ask the AI of your choice for help.
Hello World
Did I convince you? So let’s take our first steps in the shiny world of retro computing. Let’s open a text editor and write the famous Hello World (hello.c):
#include <stdio.h>int main() {
printf("Hello, world!\n");
return 0;
}
In order to compile, we need a C compiler. Luckily, there has been an ongoing effort to maintain an up-to-date GNU toolchain. You can download precompiled binaries for all major platforms (including the Atari ST itself, but that’s not what I’d recommend) on Thorsten Otto’s Crossmint site. You need GCC and binutils to compile hello.c:
m68k-atari-mint-gcc hello.c -o hello.prgAnd there you have it: the iconic Hello World compiled for a 16 bit CPU.
What makes programming for the Atari ST so special?
One of the things I appreciate most about programming for the Atari ST is its simplicity. The Motorola 68000 CPU, often labeled as a 16-bit processor, actually offers much more. It features a set of sixteen 32-bit registers and supports flat 32-bit addressing and arithmetic operations, albeit with slightly reduced performance compared to 16-bit operations. This stands in stark contrast to the PC platform of the time, which was limited by the Intel 8086’s small set of 16-bit registers and a segmented memory model that made programming far more complex. In comparison, the M68K assembly language is quite elegant and user-friendly.
The system lacks virtual memory and offers no memory protection. Most of the hardware is accessible through memory-mapped I/O, allowing any program to directly interact with the registers of hardware components. This direct access becomes particularly valuable when working with graphics or sound. Additionally, there is no dedicated video memory. While an API exists for hardware-independent graphics operations, developers can also write directly to video memory if they choose.
Another notable aspect is the near absence of a strict separation between user space and kernel space. For instance, it’s possible to install interrupt service routines directly from within a user-space program. This level of control and openness makes the ST a uniquely accessible and rewarding platform for low-level programming.
All that said, what really distinguishes programming for such a system from modern software development is of course not the system itself, but the radically reduced scope of viable projects. No web-based frontend, no HTTP, no client-server interaction, no JSON, no multithreaded async I/O backend. Just a computer and all its hardware, for the programmer to abuse.
A VoxelSpace adventure
My ambitions started with a game I played on my friend’s PC for the first time most likely around 1993. Comanche: Maximum Overkill was an action-filled military helicopter flight simulator that earned its place in history mostly for one aspect. Whereas previous flight simulators rendered the landscape and all surroundings in a very abstract way using flat-shaded polygons, Comanche introduced visuals unseen so far. Suddenly there were rolling hills, narrow canyons and steep mountains and meandering rivers to smoothly fly over. All due to an algorithm called VoxelSpace.
Comanche quickly earned a reputation as a system seller for high-end PCs (which at the time most likely featured an Intel 486 processor and between 4 and 8 MB of RAM). The time of humble systems like the Atari ST or its more advanced contender, the Commodore Amiga, was over and would never come back. Comanche would never run on these machines and nothing would ever come close.
One day in 2024 I had the crazy idea to make VoxelSpace run on the Atari ST. That was a terrible idea and those are sometimes the best.
I found terrain and texture data of the original Comanche in an easily usable format (Sebastian Macke’s excellent GitHub page). The data still needed a bit of fiddling in order to be usable. I created my own tooling based on the GIMP image manipulation program and its LISP-based scripting language “Script-Fu”. Using the Crossmint GCC compiler I created a hacky little demo program for the ST that loaded the terrain data into memory and brought it to the screen. This is what it looked like:
Of course it was slow but it did look nice. I even posted a screen capture on Twitter and received very positive responses. I kept iterating and optimizing and finally ended up with something that ran at interactive speeds. Had it been a game released in 1990 it would have blown a lot of minds.
Graphics on the Atari ST
But how did I actually get there? How do you even draw to the screen on an Atari ST? Turns out you just have to write data into the part of RAM that is used as video RAM. Easy, right? In fact, it appears easy, but as with many things in retro computing, there’s a twist. Time for some 16-bit computing folklore (technical content ahead!).
Remember the Atari ST has a resolution of 320x200 pixels in 16 colors? In order to distinguish 16 colors, you need 4 bits, right? This is a little bit unfortunate as the smallest addressable unit in a computer is the byte, which is famously composed of 8 bits. So there are two sane ways to map pixels to bytes:
- Use only the lower four bits of every byte and leave the upper four bits unused (therefore wasting four bits of information per pixel but keeping a nice 1:1 pixel-to-byte mapping)
 - Stuff two pixels into one byte. This doesn’t waste any memory but comes at the expense of generating an additional overhead when drawing single pixels.
 
Unfortunately, the hardware designers at Atari went for option 3, which is an utterly insane choice: they used interleaved bit-planes.
To understand a bit-plane, you decompose pixels into their bits and group the corresponding bits of neighboring pixels together. For example, take bit 0 from each of pixel 0, pixel 1, pixel 2, and so on. There are four bits per pixel, hence there are four distinct maps of bits (this is actually where the term “bitmap” comes from). Each map represents one bit position across all pixels. It helps to visualize these maps layered on top of each other, hence the term “bit-planes”.
To make matters more complicated, Atari’s designers came up with the idea of interleaving the bit-planes, making the graphics hardware simpler to implement. The interleaving works as following: Every group of 16 horizontally neighboring pixels are encoded into 8 bytes in the following way:
- Bytes 0 and 1 contain the 16 bits of bitplane 0
 - Bytes 2 and 3 contain the 16 bits of bitplane 1
 - Bytes 4 and 5 contain the 16 bits of bitplane 2
 - Bytes 6 and 7 contain the 16 bits of bitplane 3
 
So every 8 bytes encode a span of 16 pixels. A screen line is 320 pixels wide so it takes 160 bytes to encode. There are 200 screen lines so a full screen occupies exactly 160*200 = 32000 bytes. Why group 16 pixels and not 8? That’s because the ST is a 16-bit machine and has a data bus that is 16 bits wide, which results in a general preference of 16-bit words over 8-bit bytes. However, this makes programming graphics on this platform a bit of a challenge.
Enter the assembly
When programming for a retro system such as the Atari ST with its (ridiculously slow by today’s standards) 8MHz processor, efficiency is immensely important. While the GNU C Compiler outputs reasonably optimized code, it is very beneficial to keep a close eye on the actual assembly generated. I already knew some 68000 assembly when I started the VoxelSpace project, but I learned a lot more by analyzing what the compiler generated and refining the original C code to guide the compiler toward better output. Reading assembly code was difficult at first but, as with all things, it became easier with time.
When you study Computer Science, you are made aware of the dangers of premature optimization. You learn to focus on algorithmic improvement first, yielding so-called asymptotic complexity, expressed in the famous big-O notation. An algorithm that is said to have O(N) is better than one that has O(N²). However, the VoxelSpace algorithm is pretty much resistant to asymptotic analysis. It’s already bounded by O(N), where N is the number of samples to take from the terrain in each frame, and that’s a fixed number you can choose to suit your needs.
Now is the time for assembly-level optimization. There are basically two ways to do that:
- By inspecting the assembly code generated by the compiler. When compiling with option “-fverbose-asm”, you will get assembly code with a comment per line that explains which C variables a particular register maps to.
 - By embedding assembly code into your C code. This is most easily done using GCC’s inline assembly feature.
 
During the course of a couple of weeks I managed to get at least an order of magnitude out of my VoxelSpace demo. Through experimentation and research I learned many optimization tricks. Here are just a few:
- Concentrate on your inner loops. Profile as good as you can. Analyze the generated assembly code. Don’t just trust the compiler!
 - Simplify arithmetics. Avoid multiplication and division at all costs.
 - Use look-up tables. Memory access is relatively fast on old CPUs and caches were only introduced in later CPUs (that’s very different today).
 - Avoid bit-shift operations. The original Motorola 68000 used an unoptimized version where shifting by larger amounts was actually more expensive than shifting by just a few positions. Shift-left by one or two bits can be replaced with addition.
 - Pre-shift data that otherwise needs to be bit-shifted in tight loops.
 - Economize on registers. Using fewer registers avoids having to spill registers to the stack.
 - Use poor-man’s SIMD: Use 32-bit registers to perform two 16-bit additions at once. This comes in very handy if you do any kind of vector arithmetic and can live with the reduced precision.
 - Use 16-bit integers. While the Motorola 68000 CPU is perfectly able to perform arithmetics on 32-bit registers, working with the 16-bit variants of operations such as “add”, “asl” or “eor” can yield a few cycles in ultra-tight loops.
 
Modern Tooling
One might be inclined to think that programming for a retro system is a backward experience, using outdated tools and having a high barrier of entry. Thanks to the efforts of many great people, this is not at all the case. In fact, some of the possibilities are more advanced than what software developers on current systems experience daily.
It starts with the Integrated Development Environment. The recommendation is to use Visual Studio Code and Microsoft’s C/C++ extension which can be easily configured to use a cross-compiler.