It all begun with the Speccy - the ZX Spectrum 48K+, to be precise.
I got it when I was 13 years old - the best gift ever:
An entire career has passed since then.
And one of the habits I picked up along the way, was fooling around in my free-time with SW-only 3D graphics.
In fact, a few years ago, I ported the main logic into an ATmega328P microcontroller, implementing "points-only" 3D rendering, and driving an OLED display via the SPI interface... at the magnificent resolution of 128x64 :-)
So the path to even more useless tinkering was clear:
I just HAD to make this work for the Speccy, too! :-)
And as you can see in this repository... I just did:
The resulting statue.tap file is also committed in the repo,
in case you just want to quickly run this in your FUSE emulator.
I also added a second 3D model of a sphere - run sphere.tap
in FUSE to see the result:
$ fuse -g tv3x tap/statue.tap
...
$ fuse -g tv3x tap/sphere.tap
There's a simple Makefile driving the build process - so once
you have z88dk installed, just type:
The cross-compiler used for the compilation is z88dk. If it's not packaged in your distribution, you can easily build it from source:
mkdir -p ~/Github/
cd ~/Github/
git clone https://github.com/z88dk/z88dk/
cd z88dk
git submodule init
git submodule update
./build.sh -p zx
You can now use the cross compiler - by just setting up your enviroment (e.g. in your .profile):
export PATH=$HOME/Github/z88dk/bin:$PATH
export ZCCCFG=$HOME/Github/z88dk/lib/config
Since the Speccy's brain is even tinier than the ATmega328P's, I had to take even more liberties: I changed the computation loop to orbit the viewpoint (instead of rotating the statue), thus leading to the simplest possible equations:
int wxnew = points[i][0]-mcos;
int x = 128 + ((points[i][1]+msin)/wxnew);
int y = 96 - (points[i][2]/wxnew);
No multiplications, no shifts; just two divisions, and a few additions/subtractions.
If you're wondering how this can possibly be a valid 3D projection, you can read the full-of-math "for nerds" section below :-)
But that was not the end - if one is to reminisce, one must go all the way!
So after almost 4 decades, I re-wrote Z80 assembly - and made much better use of the Z80 registers than any C compiler can.
I also replaced the two costly divisions with two multiplications
using a lookup table of reciprocals. In fact I pepper-sprayed
"page-based" lookups ( mov H, hi-byte-of-table-offset; load index
into L - read from (HL) ) for a final phenomenal speedup...
- from 6.2 frames per sec (in C)
- ...to 14.0 frames per sec (in optimised ASM)
Happiness :-)
I was also curious about precalculating the entire paths and the screen memory writes - you can see that code in the precompute branch.
As shown in the video above, this version runs 4 times faster, at 40 frames per sec. It does take a couple of minutes to precompute everything, though. Since I had all the time in the world to precompute, I used the complete equations (for rotating the statue and 3D projecting) in 8.8 fixed-point arithmetic:
The reason for the insane speed, is that I precompute the target pixels' video RAM locations and pixel offsets, leaving almost nothing for the final inner loop, except extracting the memory access coordinates from 16 bits/pixel:
- The offset within the 6K of video RAM, in the upper 13 bits
- The pixel (0-7) within that byte, in the lower 3 bits
It's also worth noting that the inline assembly version of the "blitter" is 3.5 times faster than the C version. And I could optimise it more... but what's the point :-)
Since these are just reads, shifts and writes, I confess I did not expect to see that much of a difference... But clearly, C compilers for the Z80 need all the help they can get :-)
Here we go: From raw floating-point data to ZX Spectrum screen pixels, with every step explained.
The statue model consists of 153 3D points, stored in statue_data.py as
floating-point values. These are processed by points_gen.py into a binary blob
(points.bin) embedded into Speccy's memory - i.e. "burned" inside the .tap file
and used as-is at runtime. There is some pre-scaling happening, to turn them into
integer values (pre-scaled by S = 8960); simply put, integers are used at
runtime to avoid floating point math on the Z80 - who simply doesn't support it!
The scale factor S = 8960 is uniform across all three axes - for example...
{ 0.131, 0.116, -0.501 } x 8960 -> { 1174, 1039, -4488 }
Note that the actual ranges per axis are asymmetrical: the model is not centered.
To maximize runtime performance, the coordinates are not just scaled; they are
also pre-transformed by points_gen.py before being embedded in the binary.
Here's how.
First, a swap:
tmp = Y; Y = Z; Z = tmp
The storage order becomes [X, Z, Y] instead of [X, Y, Z]. The per-point loop then,
reading from (HL) as it goes, can compute depth and screen-Y first; and skip
computing screen-X for out-of-vertical-bounds points.
It's a simple optimisation that helps performance a lot when we zoom-in enough for the statue to go out-of-bounds.
The coordinates are also transformed into a "screen-ready" fixed-point space:
Component | Raw formula | After axis swap
--------------+--------------------------+--------------------
X' | SE - X_raw / 14 | depth axis
Y' | Y_raw / 9 * 64 | screen-X axis [2]
Z' | Z_raw / 9 * 64 | screen-Y axis [1]
I know, this looks very cryptic - bear with me and keep reading :-)
With SE = 415 ( 256 + MAXX/16 ), and since we scaled by S = 8960:
X' = 415 - X_float * 640
Y' = Y_float * 63795.6
Z' = Z_float * 63795.6
This is what our transformed, final-integer-data look like.
The sin/cos table is precomputed in tables_gen.py. To match the orbit radius
and maintain 16-bit precision without runtime scaling, we additionally do this:
sin_val = (sin_raw / 3) * 64
cos_val = cos_raw / 3
...where the raw values are scaled by T = 256:
msin = sin(theta) * T * 64 / 3 = sin(theta) * 5461.3
mcos = cos(theta) * T / 3 = cos(theta) * 85.3
These different scale factors set the camera's orbit radius; tweaking to match the "orbit" perfectly to the model size.
But wait - why is sin scaling different to cos?
Well, in the final equations (coming up next), mcos just needs to offset
the camera depth by the orbit radius - which is a modest shift. T/3 = 85.3
is right for that.
msin in contrast, gets divided by wxnew and added to the horizontal screen
position. For a meaningful pixel shift, it needs to be much larger. The x64
multiplier gives T/3 x 64 = 5461.3 - which after the division yields a reasonable
horizontal swing.
Simply put: the asymmetry is intentional! Both represent angles, but one controls how far the camera is (mcos -> denominator), and the other controls how far the point swings across the screen (msin -> numerator/denominator -> screen pixels).
So in the end, our equations perform the simplest possible projection; there are no multiplications at run-time; only two divisions and a few additions.
wxnew = X' - mcos
y = 96 - Z' / wxnew
x = 128 + (Y' + msin) / wxnew
How does this work? Let's see...
96 and 128 are the mid-point of the Speccy's screen (256x192).
If we expand the equations and factor out 640 from the depth term, we get this
wxnew = X' - mcos
wxnew = (415 - X_float * 640) - (cos(theta) * 5461.3)
wxnew = 640 * (415/640 - X_float - cos(theta) * 5461.3/640)
= 640 * (0.6484 - X_float - cos(theta) * 8.533)
Now, if we divide all numerators and denominators in the projection division by 640...
y = 96 - Z' / wxnew
x = 128 + (Y' + msin) / wxnew
...the equations become:
Z_float * 99.68
y_screen = 96 - -------------------------------------------
0.6484 - X_float - cos(theta) * 8.533
Y_float * 99.68 + sin(theta) * 8.533
x_screen = 128 + ---------------------------------------------
0.6484 - X_float - cos(theta) * 8.533
And if we define depth as the positive distance:
depth = X_float + d * cos(theta) - d0
...and...
f ~ 99.68 px focal length = S / 90
d ~ 8.53 units orbit radius = T * 64 / (3 * 640)
d0 ~ 0.65 units SE offset = 415 / 640
...then the full projection equations become:
f * Z
y_screen = 96 - ---------
depth
f * Y + d * sin(theta)
x_screen = 128 + ------------------------
depth
It becomes clear now that these are the standard 3D projection equations;
(see the diagram below) - with the d*sin(theta) offseting our camera's
viewpoint by the rotation we apply.
Parameter | Value | Derivation | Role
-------------+-------------+-------------------------------+----------------------------
f | 99.68 px | S * 64/9 / 640 = S / 90 | Focal length, controls proj size
d | 8.53 units | T * 64/3 / 640 = 256/30 | Camera orbit radius
d0 | 0.65 units | SE / 640 = 415 / 640 | Keeps model in front of camera
Screen centre| (128, 96) | - | Half of 256 x 192
View dir | +X | - | Camera looks along +X axis
The camera goes around on a circle - around a point at a specific distance from the model.
camera_X(theta) = d * cos(theta) - d0 = 8.53 * cos(theta) - 0.65
SE = 256 + MAXX / 16 = 256 + 2551 / 16 = 256 + 159 = 415
256 screen width, centres the model horizontally
MAXX/16 accounts for the X-range after preprocessing
Without SE, X' = -X_float * 640 could be negative for positive X, flipping the division sign and putting the model behind the camera. SE adds a constant offset so X' stays positive and depth > 0 for all model points.
NOTE: the equations below are shown in their C form. The ASM version uses the same logic, substituting the two divisions with multiplications via the reciprocal lookup table.
Float data {X, Y, Z} in `statue_data.py`
|
| Build Pipeline (`points_gen.py` & `tables_gen.py`)
v
Pre-transformed coordinates in `points.bin`
(Scale S=8960, Axis swap, and Screen-space transform applied)
|
| Runtime Loop
v
X' = Precomputed depth axis
Y' = Precomputed screen-X axis (stored at [2])
Z' = Precomputed screen-Y axis (stored at [1])
msin = Precomputed sin(theta) transform
mcos = Precomputed cos(theta) transform
|
| For each frame
v
wxnew = X' - mcos (depth from camera)
y = 96 - Z' / wxnew (perspective -> vertical)
|
| If 0 <= y < 192:
v
x = 128 + (Y' + msin) / wxnew (perspective -> horizontal)
byte_addr = ofs[y] + (x >> 3) (done via scr_ofs lookup table)
bit_mask = 128 >> (x & 7) (done via mask lookup table
set bit in byte
Now all I need to do is wait for my retirement... so I can use my electronics knowledge to revive my Speccy, and test this code on the real thing, not just on the Free Unix Spectrum Emulator :-)
Then again, maybe you, kind reader, can try this out on your Speccy - and tell me if it works?
Cheers! Thanassis.




