C64：利用精灵复用

原文

Last week’s article summarized the sprite system on the Commodore 64. I’ve done this for quite a few other platforms at this point, and I’ve usually followed up with a short project that makes heavy use of sprites and often reuses them over the course of each frame. To date, this project has always been my “shooting gallery Rosetta Stone” project. However, for the C64, I have unfinished sprite-related business with an entirely different Rosetta Stone project:

When last I left the C64 Lights-Out project, I’d used a set of static sprites to add rivets and drop shadows to the game board. At the time, I left it at that and did not attempt to completely close the gap with the NES display. In particular, the buttons on the board are flat, and on the NES and PICO-8, they both had drop shadows of their own and animated being pressed when they were selected. We used all eight of the VIC-II chip’s sprites to produce the shadows we have above; we’d need 33 to do what we want properly.

As of last week, that isn’t going to stop us anymore. We need 33 sprites at once, and 33 sprites at once we shall have. Animating the button presses also will take us into slightly unknown territory, too; on the NES and PICO-8 the buttons were blank, but the buttons have labels here on the C64 and we will need to animate those too.

Designing Our New Display

Our previous display uses all 8 sprites to produce the rivets and shadows: one sprite for each corner, and then two expanded sprites each for the left and bottom sides. We now wish to add new sprites to represent drop shadows for each of the puzzle cells. Those cells are each squares 3 characters to a side (so, a 24×24 pixel region), but thanks to the bezels around each button, the shadow itself only needs to be 19 pixels tall. That fits neatly into our 24×21 sprite space.

With five cells per row, we’ll need to dedicate five sprites to the cells’ drop shadows. We will reuse those cells in each subsequent row, and if a shadow needs to go missing (because the corresponding button is depressed), we will simply disable that sprite on that row via the $D015 register. Two sprites will suffice for the corner sprites; the top corners may be similarly re-used for the bottom ones. That leaves one sprite spare, and it seems like it would be most conveniently used for the rest of the left-side shadow.

That leaves the bottom shadow. For that, we’ll need to re-use two of the sprites we’d previously been using for cell shadows. This will impose the tightest timing requirements upon us of anything in the design. Because we are altering the sprite graphics, the magnification, and the X coordinate, we cannot reconfigure the sprite registers until after we have finished rendering the last cell, and we’ll only have a few scanlines of space to do that work. As we’ll see once we get into the nuts and bolts, we’ll normally have dozens.

With sprite usage sorted out, we may now move on to defining the new graphics. We have two new graphical elements to define: the cells’ drop shadows (which fit in one sprite pattern) and alternate graphics for a puzzle cell when it’s being pushed down (which consumes 8 ideally consecutive character codes).

We have plenty of memory for both of these, but we do hit another inconvenience when we define the new characters; we’re using the Extended Color Mode for both the game board and the status window, and we cannot get 8 characters together for our new button graphic without overwriting some of the punctuation that we need in the status line. That’s not really a problem for us, though, because we can just turn off Extended Color Mode for the status line and get 256 characters back. In fact, as long as we’re doing that, we really might as well just switch back to the system font and get mixed-case text while we’re at it:

Going back to the system font like this also makes our final graphics challenge trivial. The character set that we are using for the game board only uses letters as part of the labels, and it uses each letter at most once. We may animate the letters during the button press simply by redefining the character in place.

Assigning the Sprites

In the original code, our sprites only had to be configured once, and we did it at program start. Sprite data mostly fits neatly into a few contiguous ranges (X-Y coordinates, colors, and patterns) and it just copied values into place out of tables. We now have to revisit our design so that we can efficiently update everything as needed.

I’ve decided to organize the sprites so that the more often they get updated, the lower a sprite index they have. That means that sprites 0-4 represent the cells’ drop shadows (changing 4 times a frame just for that), the corner sprites are 5-6 (changing twice a frame in both location and shape) and the left drop shadow is sprite 7 (changes twice a frame but just location). Sprites 0 and 1 will also be the sprites further repurposed for the bottom-side drop shadow (which changes their location, shape, and magnification).

At the start of the frame, all 8 sprites are in their uppermost positions, so this is where we initialize them:

Their patterns are also consistent at this point:

When updating the rows we’ll only need to update the Y coordinates, but we do need a table of subsequent Y coordinates. That’s one row shorter than usual because we handled the top row as part of the original shadow_loc definition:

The bottom drop shadow is ad-hoc enough that we’ll just be assigning immediate values to sprite registers for those. Changing the sprites back will be managed via the shadow_loc and shadow_pat reads at the end of the frame.

The last thing we need will be strictly in RAM. We’ll declare a five-element array of the value to put in the sprite-enable register for each row of cells:

        .space  spr_enable 5

This and the sprite color registers can both be initialized with simple loops; all sprites are color 11 at all times, and the initial value of every element in the spr_enable array will be $FF.

Implementing the Interrupt Handler

In our initial program, there are only two interrupts per frame: one mid-screen to prepare for drawing the board and status line, and one at the bottom to reset the display for the logo next frame. This meant that the interrupt handler could decide what to do simply by checking the current scanline as reported by the $D012 register; if the value was over 127, do the end-of-frame work, and otherwise do the mid-frame work. That simple test required only a single instruction, and while the exact value of $D012 might vary, the test was broad enough that it didn’t have to worry about it.

We are no longer so lucky. We now have no fewer than eight interrupt points:

One interrupt between the logo and the board. This enables Extended Color Mode and alters the horizontal scroll value to properly center the board. This interrupt is identical to our original mid-frame interrupt.
Four interrupts, one each inside the top four rows of puzzle cells. This updates the Y coordinates of the drop shadows and enables the appropriate columns with a write to $D015. One of these four will also be responsible for moving the other three sprites into their lower positions, thus handling the bottom half of the left-side shadow and the corners.
One interrupt between the last row of cells and the end of the board. This repurposes two of the cell drop shadows to serve as the bulk of the bottom shadow of the board itself.
One interrupt between the end of the board and the beginning of the status window. This swaps back to the system font and disables ECM. Since the character set switch is more or less instantaneous, we must take care to ensure that we are entirely out of the board’s last character row or there will be extra garbage rendered between the board and the status lines.
One interrupt at the very end of the display to restore all sprites and control registers for the start of the next frame. This is also the only interrupt that is permitted to forward to the KERNAL’s default IRQ handler (advancing the clock, scanning the keyboard, etc.) The timing between the other interrupts is too tight to permit it anywhere else.

That’s a lot, and some of these interrupts only have a few scanlines worth of time that they may operate in. We can’t rely on $D012 to identify what to do anymore; instead we’ll keep a separate counter, increment it on each IRQ, and loop it around at the end. We can then also use that counter as an index into various arrays to efficiently handle many of the differences.

        .data
        .space  _irq_phase 1
        .text

One of those differences will be “where the interrupt happens in the first place.” After a bit of experimentation, I come up with this table:

I’ve put the bottom-of-the-frame interrupt first here. This is mainly out of laziness—since this is the interrupt that restores the display for the next frame, putting it first means that it will also end up being the screen initialization code.

Starting and Stopping the Interrupt System

When setting up the IRQ, the first thing we need to do is put the IRQ phase and the sprite-enable flags into their initial states, as we discussed above.


enable_display_irq:
        lda     #$00                    ; Start in phase 0
        sta     _irq_phase
        lda     #$ff                    ; Start with all shadows enabled
        ldx     #$04
*       sta     spr_enable,x
        dex
        bne     -

Then we need to configure the interrupt to fire on the first phase point. This part of the code is mostly unchanged from our previous run, and frankly not that different from the process of doing it from BASIC.

        lda     #$7f                    ; Disable timer IRQ
        sta     $dc0d
        lda     #$1b                    ; All Raster IRQs in 0-255 range
        sta     $d011
        lda     irq_rows                ; Set first Raster IRQ
        sta     $d012
        lda     #<_irq                  ; Configure custom IRQ handler
        sta     $314
        lda     #>_irq
        sta     $315
        lda     #$01                    ; Enable raster IRQ
        sta     $d01a
        rts

Disabling the IRQ basically does the IRQ configuration but in reverse…


disable_display_irq:
        lda     #$00                    ; Disable raster IRQ
        sta     $d01a
        lda     #$31                    ; Restore system IRQ handler
        sta     $314
        lda     #$ea
        sta     $315
        lda     #$81                    ; Re-enable system timer
        sta     $dc0d

… but we also have an extra complication. We could be in any of several display states depending on exactly when we were called. We need to put the graphical systems back just how we found them, no matter what we had done before. This ends up just being a pile of register writes.

        lda     #$1b                    ; Disable ECM
        sta     $d011
        lda     #$08                    ; System-standard HSCROLL
        sta     $d016
        lda     #$14                    ; Restore normal charset
        sta     $d018
        lda     #$00
        sta     $d015                   ; Disable all sprites
        sta     $d017                   ; De-magnify all sprites
        sta     $d01d
        rts

The Handler Itself

The IRQ handler has a lot of tasks, but we can check them in sequence and coalesce the similar versions. Our first step in all cases is simply to acknowledge the interrupt:


_irq:   lda     #$01
        sta     $d019

Then we have to look at the phase. There are several possible ways to efficiently implement a test on a large number of cases; I covered one in my grimoire of 8-bit Implementation Patterns. That one relied on a jump table and could hit all targets in the same amount of time. I decided not to go with that one here, because only one of our targets is really time-critical and enough are similar that we can simply do a chain of if/else statements and still be fine.

It’s not our most expensive case, but we consider phase 0 first simply because we end up checking for it as part of loading the phase in the first place:

        ldx     _irq_phase
        bne     _not0

Phase 0 is where we initialize all our sprite data for the frame, as well as set up the character graphics for the logo display. This is straightforward though it does need a few loops.

        lda     #$08            ; Phase 0: Scroll 0 for centered logo
        sta     $d016
        lda     #$18            ; Back to the graphical charset
        sta     $d018
        ldx     #$10            ; Restore sprite positions and configurations
*       lda     shadow_loc,x    ; X and Y coordinates
        sta     $d000,x
        dex
        bpl     -
        ldx     #$07
*       lda     #$0b            ; Colors
        sta     $d027,x
        lda     shadow_pat,x    ; Pattern indices
        sta     $7f8,x
        dex
        bpl     -
        lda     #$80            ; Y Expand (sprite 7 only)
        sta     $d017
        lda     spr_enable      ; Top row of enabled sprites
        sta     $d015
        lda     #$00            ; X Expand
        sta     $d01d
        beq     _done

If we weren’t zero, we then check for phase 6. This one is our time-critical handler; we need to relocate sprites 0 and 1 to the bottom of the board and change their X coordinates, sprite indices, and magnification status. I don’t bother with tables or anything else for this step, and simply load the values we need with direct copies straight out of immediate values.


_not0:  cpx     #$06
        bne     _not6
        lda     #$cd            ; Handle phase 6: move bottom shadow into place
        sta     $d001           ; Y coordinates
        sta     $d003
        lda     #$8f            ; X coordinates
        sta     $d000
        lda     #$bf
        sta     $d002
        lda     #$c3            ; Pattern indices
        sta     $07f8
        sta     $07f9
        lda     #$03            ; X magnification
        sta     $d01d
        lda     #$ff            ; Ensure all sprites enabled for this part
        sta     $d015
        bne     _done

All of these steps finish up with some load of a constant value that sets the zero flag to something consistent. That means I get to save a byte on the way out of all of these branches by using BEQ or BNE as appropriate instead of a JMP.

Phase 7 is much like Phase 6 except with a somewhat looser time constraint and much less to do. We just write two registers.


_not6:  cpx     #$07
        bne     _not7
        lda     #$16            ; Phase 7: Mixed-case charset for status bar
        sta     $d018
        lda     #$1b            ; Disable ECM so we can do mixed-case properly
        sta     $d011
        bne     _done

The last fully unique phase is phase 1, which basically undoes the work of phase 7. However, phases 2-5 will also need to turn the phase number into a index ranging from 0-3, so instead of comparing the value against 1 we decrement it and check if that made it zero:


_not7:  dex
        bne     _not1
        lda     #$5b            ; Phase 1; Enable ECM for board
        sta     $d011
        lda     #$0c            ; And scroll 4 to the right to center it
        sta     $d016           ; and the instructions
        bne     _done

We can then, in the 2-5 case, decrement it again to get our index. We pull the new Y coordinate out of the shadow_rows array and feed it to the 5 sprites we’re moving, and then pull the new value for $D015 from its table and load it into place too:


_not1:  dex
        lda     shadow_rows,x   ; Phases 2-5; move cell shadows down a unit
        ldy     #$08
*       sta     $d001,y
        dey
        dey
        bpl     -
        lda     spr_enable+1,x  ; Set this row's sprite-enable
        sta     $d015

That one is a little bit tricky since we have to remember that spr_enable is a five element array and we processed the first element back in phase 0.

In most cases, we’re done, but phase 3 has some extra work to move sprites 5-7 into their final positions. We can check for that with third DEX:

        dex
        bne     _done           ; For phases != 3, we're done

For a “classic” chain of if/else tests to go through options, this sort of continuous decrement is the way to do it. In this case, we have to move all three sprites down a bit and also change the patterns on sprites 5 and 6 to make them be the bottom corners instead of the top ones:

        lda     #$ba            ; For phase 3, move the corners down
        sta     $d00b
        sta     $d00d
        lda     #$90            ; Also the left shadow
        sta     $d00f
        lda     #$c2            ; Adjust sprite patterns
        sta     $07fd
        lda     #$c4
        sta     $07fe

This then falls through into the tail of the IRQ handler. Our first task is to set up the next interrupt, which we do by reloading the original IRQ phase value, wrapping around if necessary, loading the corresponding scanline to register with the IRQ mechanism in $D012, and store the new phase back for the next time:


_done:  ldx     _irq_phase
        inx
        cpx     #phase_count
        bne     +
        ldx     #$00
*       lda     irq_rows,x
        sta     $d012           ; Register next IRQ line
        stx     _irq_phase

Normally, what then happens at this point is that we’d check whether there would be a timer IRQ and return either to the original system IRQ handler or to the simple cleanup routine depending on whether it’s waiting. However, I found that I couldn’t get away with that here; too many of these IRQs are too closely-spaced to safely spend the time needed to process a keystroke if there is one. I found that pressing keys would cause the shadows to flicker or the text to glitch out. As such, I only permit the interrupt to run at the very end of the screen:

        cpx     #$00            ; At bottom of screen?
        bne     _notim          ; If not, return immediately
        lda     $dc0d           ; Check if there'd have been a timer IRQ
        beq     _notim
        jmp     $ea31           ; If so, jump to it
_notim: jmp     $febc           ; If not, clean up

This is perfectly fine on an NTSC system, but it does mean that if this runs on PAL the system clock (which is still running at 60Hz) will end up falling behind because it’s only running the system timer at 50Hz now. That’s a cost, I suppose, but it’s one we’ll happily pay.

Animating keypresses

The C64 version of Lights-Out is one of the first ones I wrote, and as a result it was a bit more ad-hoc about its display code. When moves are made, it goes and updates the cells that changed immediately; it wasn’t until I started making versions for game consoles that I made a sharper distinction between “model” and “view”, rendering the full board anew each frame from a central, more abstract representation of the puzzle state.

That’s helpful for me here, though, because it means that I already have a bunch of routines handy that will do things like compute where, exactly, on the screen the cell is corresponding to the move the player just made, and it also means that the rest of the game logic is already such that I can rely on screen memory holding the most up-to-date information I need.

What that means is that my strategy for animating the button presses can be pretty straightforward, and will hinge on a function that flips the pressed-ness of whatever button corresponds to its argument. The overall logic will goes like this:

During startup, assign the characters so that the “in” and “out” graphics are each contiguous blocks of 8 characters in the font definition. I ended up assigning “in” to the range $26-$2D and “out” to the range $32-$39. Thanks to how extended color mode works, the cells being lit or not is encoded in the most significant bit, so “on” cells should add $80 to both these values.
The flip_cell function itself opens by converting its argument (a number from 0-25, representing buttons A through Y) into a pair of X/Y coordinates and a pointer to the top-left of the cell in screen memory.
We determine the new state by loading the character in the upper left of the cell and then XORing it with the value $14, which will convert one to the other while preserving lit/unlit status.
Write out the eight new characters to their appropriate locations in the cell.
Update the spr_enable array appropriately. If we have pushed the button in, we must turn off the column’s corresponding bit in the byte corresponding to our row. Otherwise, we must restore it.
Update the definition of the character that is its label appropriately. Our custom font is stored at $2000; the definition for the character labeling cell N is $2008 + N*8. If we have pushing the button in, we move the character down and to the left one pixel; otherwise we move it up and to the right.
Call this function once at the start of a move; eight frames after doing this, call it again.

The implementation of this doesn’t really make any of this clearer, but there were a few handy tricks I was able to use while working through it. One of them was that I didn’t need to record anywhere whether we were pushing the button in or releasing it; we could deduce that from the last value we wrote in step 4. Another was in step 6; the N*8 term has a maximum value of 200, which means that we can put that result in an index register and do all our copying by indexing from a base of $2008 or $2009 as needed. No explicit pointers were necessary at all.

With that, our work is done, and we can say that we really have matched or exceeded the NES version with our C64 port. The final source code is here, and the compilation archive will also offer all historical versions.

Getting Mocked by the Cyberpunk Pioneers

One of the more unusual, but useful, artifacts of the cyberpunk movement in SF was a document called the Turkey City Lexicon, which collects a bunch of handy terms for critiquing rough-hewn work in writer’s workshops. They are of… varying degrees of kindness towards their subjects, but some of them have broader currency too.

For instance:

The Rembrandt Comic Book: A story in which incredible craftsmanship has been lavished on a theme or idea which is basically trivial or subliterary, and which simply cannot bear the weight of such deadly serious artistic portent.

My first implementations of Lights-Out were as, essentially, throwaway sample programs. The first C64 edition was under a kilobyte of machine code and could have just as easily been written in a screen or two of BASIC code:

We have now pushed this completely beyond the bounds of reason, making use of nearly every feature of the VIC-II in order to get parity with what our original level of effort would buy us a half a generation later on the NES:

This version is pushing three kilobytes. It is larger than Simulated Evolution, which is a significantly more seriously-intended application. Almost none of what we have done over the years has actually improved the gameplay over that first crude sketch. What are we doing here?

I’d normally answer that question by saying that the program wasn’t the point; it was just the excuse to put these techniques through their paces. In that sense, these really are not becoming a Rembrandt Comic Book; the problem of rendering a fancy game board with lots of colors while retaining the highest display resolution is both not trivial and also clearly pays off. Beyond that, though, I think that the thing we’ve really been buying here are the incremental improvements.

The first sketch was written to cheaply interoperate with BASIC’s math libraries, mostly for the random number generator.
The second brought in Extended Color Mode and imported both the best parts of the displays from the Sinclair systems as well as a 6502 port of the Xorshift RNG I adapted for the ZX81.
The third visit brought in mid-screen interrupts to properly center the display while letting the NES’s graphical logo remain untouched.
The fourth used sprites to provide highlights.
The fifth expanded the use of sprites with more aggressive use of mid-screen interrupts, allowing it to match the NES very closely.

That’s not nothing. And while I do intend to eventually try to get as good a game-board display out of the TMS9918A chip as I can, one of the things that I am taking from this is that I should also sketch out a bunch of way stations along the way, where a “reasonable” designer might conclude that this was sufficient for the complexity of the application itself.