并非每个字节都有发言权

原文

In a deterministic game engine, replay starts simple: record inputs, run the same ticks again, and compare the result.

When I started wiring replay for the sim, my first instinct was simple:

Easy. Hash everything.

For the first few fields, that feels right. Actor health, projectile position, RNG state: if any of those differ, the run probably diverged.

Then the less obvious fields pile up. The AI has a trace explaining why it turned left. The renderer has interpolation state from the previous frame. Pathfinding has a cache full of directions. A struct has padding bytes because memory is memory.

The naive checksum was heading toward this:

hash(&world.entities);
hash(&world.projectiles);
hash(&world.rng);
hash(&world.ai_trace);        // uh oh
hash(&world.render_helpers);  // definitely uh oh

That kind of checksum treats every field as equally meaningful. It catches real divergence, but it also turns harmless implementation changes into replay failures.

The bug that made this obvious was small. I changed a helper field used for inspection, and the replay checksum moved. The sim still behaved the same. The player ended in the same place. The same enemies died. But the checksum said no.

Replay was failing because debug data had a different layout.

So the question became narrower:

Which state can change future gameplay?

player health          yes
projectile position    yes
RNG stream             yes
debug events           probably not. they are observations
render interpolation   no. useful, but not gameplay truth
pathfinding cache      maybe. save it or rebuild it, but name which one

For the pathfinding cache, I want an explicit decision. Either it is rebuilt before AI can read it, or it is persisted and treated as part of the runtime state that matters. What I want to avoid is a cache drifting into replay just because the checksum happened to traverse it.

This post walks through the split I ended up using in a Zig ARPG engine: authoritative gameplay state, derived caches, observation/debug output, and presentation state. The names are local, but making each field pick a role has been useful.

Determinism still needs the usual work: fixed ticks, explicit RNG, stable iteration order, initialized state, and no hidden dependency on render timing or local machine state. The checksum only tells me whether two runs arrived at the same authoritative state.

The tick gives replay a fixed checkpoint

The sim advances in fixed ticks, not render frames.

The outer tick function mostly schedules phases:

simulation.zig

pub fn tick(
    self: *Simulation,
    sim_input: Input,
    maybe_tick_events: ?*TickEventQueue,
) void {
    assert(self.world.phase == .idle);
    self.world.assert_idle_phase_queues_drained();
    self.world.ai_trace.clear();

    self.run_ingress();
    self.run_control();
    self.run_derive();
    self.run_plan(sim_input, maybe_tick_events);
    self.run_apply(maybe_tick_events);
    self.run_cleanup(maybe_tick_events);

    self.world.tick_count += 1;
    self.world.transition_to(.idle);
    self.world.assert_idle_phase_queues_drained();

    if (builtin.mode == .Debug) {
        self.world.validate();
    }
}

I like this function because it has boring edges:

idle
  -> ingress    // admit queued world/session changes
  -> control    // update control state
  -> derive     // rebuild derived facts before decisions
  -> plan       // turn input and AI into planned work
  -> apply      // commit movement, physics, combat
  -> cleanup    // retire per-tick leftovers
  -> idle

Replay needs that kind of boring order. Every tick starts from idle, drains the queues it expects to drain, runs systems in a fixed order, increments time once, and returns to idle. That gives the checksum a specific point in the loop to measure.

If a queue leaks between phases, the next phase can read a command that belonged to the previous one. If a system mutates state in the wrong phase, it becomes harder to explain which tick caused which result. If the world does not return to idle, the next tick starts with unfinished work already loaded.

The tick boundary says where work is allowed to happen.

Replay records inputs

For this replay setup, the file says what went in.

If the replay file stores "the fireball hit for 18," replay is checking a recorded answer instead of checking the sim.

The contract is:

seed + input tape -> ticks -> same authoritative result

The recorder is deliberately small:

replay.zig

pub const Recorder = struct {
    inputs: [recording_ticks_max]Input = undefined,
    count: u32 = 0,
    seed: u64 = 0,

    pub fn push(self: *Recorder, input: Input) void {
        if (self.count >= recording_ticks_max) {
            @panic("replay input buffer overflow");
        }

        self.inputs[self.count] = input;
        self.count += 1;
    }
};

The current test recorder keeps seed, inputs, tick count, and a final checksum.

The harness runs once while recording, runs again while replaying, and compares the final checksum:

harness.zig

const record_checksum = try run_record_pass(
    allocator,
    scene,
    ticks,
    recorder,
    input_fn,
);

const replay_checksum = try run_replay_pass(
    allocator,
    scene,
    recorder.recorded_inputs(),
);

// Replay passes only if both runs arrive at the same authoritative state.
try testing.expectEqual(record_checksum, replay_checksum);

The checksum is the part that is easy to get wrong. If its surface misses state, real bugs can pass. If its surface includes helper state, maintenance changes look like nondeterminism.

The checksum does not make the sim deterministic. Fixed tick size, explicit RNG, stable iteration order, initialized state, and no hidden dependency on render timing or local machine state still do that work. The checksum is only the comparison point.

So the checksum surface has to be chosen deliberately.

The checksum names authoritative state

The checksum entrypoint lists the gameplay state that later ticks can branch on:

checksum.zig

pub fn compute(w: *const World) u64 {
    var h = std.hash.Wyhash.init(0);

    hash_world_header(&h, w);     // tick, phase, rng
    hash_entities(&h, w);         // actors and cached stats
    hash_item_store(&h, w);
    hash_ground_loot(&h, w);
    hash_trinket_runtime(&h, w);
    hash_tilemap(&h, w);
    hash_encounter_layout(&h, w);
    hash_encounter_exit(&h, w);
    hash_reward_chest(&h, w);
    hash_modifier_store(&h, w);   // runtime facts that can affect later ticks
    hash_behavior_store(&h, w);
    hash_scope_rewire_store(&h, w);
    hash_runtime_rule_store(&h, w);
    hash_projectiles(&h, w);
    hash_phase_queues(&h, w);
    hash_node_state(&h, w);

    return h.final();
}

This is a manual replay surface. I do not want to walk raw world memory and then debug every padding byte, helper field, and inspection buffer that lands in the hash.

Some entries are obvious: tick count, RNG state, entities, projectiles. Some are less obvious: runtime modifiers, behavior emissions, scope rewires, rules, and phase queues. Those are included because later ticks can branch on them.

Cached stats are included too, because later systems read those stats as runtime facts. If the cache is wrong, future gameplay can change. A helper cache can stay out only when it is rebuilt deterministically from authoritative state before anything reads it, or when the authoritative inputs to the cache are what the sim actually branches on.

Leaving fields out is also intentional. A renamed debug event should not break replay. A nicer interpolation buffer should not break replay. A changed AI trace format should not make the sim look nondeterministic.

Those things still matter. Good debug output and good presentation are part of making the game work. But when replay fails, I want the failure to point at gameplay divergence instead of a renamed label or a render helper cleanup.

The target behavior is strict about state that can alter a tick and quiet about state that only helps me inspect or draw it.

Snapshots ask a different question

Replay and save/load exercise different contracts.

Replay asks:

If I start over from the same seed and inputs, do I arrive at the same authoritative state?

Snapshot asks:

Can I freeze this world, write it to bytes, restore it, and keep going?

Those overlap, but they are not the same test.

A replay checksum can ignore a cache if the cache is rebuilt deterministically before use. A snapshot may still serialize that cache because restoring it is cheaper, less awkward at that boundary, or more useful for debugging. Serializing a field does not automatically make it replay authority.

The snapshot encoder uses the same path for measuring and writing:

snapshot.zig

pub fn size_bytes(world: *const World) usize {
    var encoder = Encoder{ .buffer = null };
    encoder.encode_world(world);
    return encoder.offset;
}

pub fn encode(world: *const World, target: []u8) usize {
    var encoder = Encoder{ .buffer = target };
    encoder.encode_world(world);
    return encoder.offset;
}

If buffer is null, the encoder walks the protocol and counts bytes. If buffer exists, it writes. Using the same path keeps measuring and writing from drifting apart.

The field walker is generic. The protocol is still closed:

snapshot.zig

fn encode_field_value(self: *Encoder, comptime T: type, value: T) void {
    switch (@typeInfo(T)) {
        .void => {},
        .bool => self.encode_value(bool, value),
        .int => self.encode_value(T, value),
        .@"enum" => self.encode_value(u8, @intCast(@intFromEnum(value))),
        .array => |info| for (value) |child| {
            self.encode_field_value(info.child, child);
        },
        .@"struct" => |info| inline for (info.fields) |field| {
            if (field.is_comptime) continue;
            self.encode_field_value(field.type, @field(value, field.name));
        },
        else => @compileError("unsupported snapshot field type: " ++ @typeName(T)),
    }
}

The useful failure mode is the compile error. If a field type is outside the snapshot protocol, the build stops. It does not silently invent a format.

If I add a clever field and the snapshot layer refuses to guess how to serialize it, I want that failure.

Derived state is fine

A deterministic engine can still have derived state. Recomputing everything all the time is not more correct by itself; it can just hide that the cache contract was never written down.

The important part is naming derived state as derived.

The flowfield stores pathing data so AI can ask which way to move from a tile:

flowfield.zig

const tile_dim = constants.tile_dimension;
const DistanceGrid = [tile_dim][tile_dim]i16;
const DirectionGrid = [tile_dim][tile_dim]Direction;

pub const Flowfield = struct {
    distances: DistanceGrid = std.mem.zeroes(DistanceGrid),
    directions: DirectionGrid = std.mem.zeroes(DirectionGrid),
    target_tile: TileCoord = .{},
    valid: bool = false,
    // ...
};

AI reads this, so the cache needs a contract.

If the flowfield is rebuilt in the derive phase from authoritative inputs before AI reads it, the checksum can hash those inputs instead of the helper arrays. If a later tick can branch on a persisted valid flag or stale direction field without rebuilding, that field belongs in the replay surface or the rebuild contract is wrong.

The distinction I try to preserve is:

Is this source state, or a deterministic cache of source state?

In the current code, snapshots serialize the flowfield so save/load resumes from the exact cached data. Replay does not hash the helper arrays because the derive phase owns their contents before AI uses them.

Useful caches are the ones most likely to blur this line, so I try to keep the rebuild rule close to the system that reads the cache.

Events report committed work

The sim tick can optionally emit events:

simulation.zig

pub fn tick(
    self: *Simulation,
    sim_input: Input,
    maybe_tick_events: ?*TickEventQueue,
) void {
    // ...
}

That ? matters. The sim can run without an event queue.

Turning events on or off should produce the same gameplay state. Events describe committed work: a skill started, a hit landed, a thing died, something interesting happened for VFX or tests.

They are outputs. No system should branch on the presence of the event queue or on the events that were emitted earlier in the tick. If a test changes the outcome by asking for events, the event path is mutating state it should only observe.

Input goes in. Gameplay state changes inside the tick. Events come out.

That keeps render, tests, and debug tooling from feeding back into the sim by accident.

Render reads sim facts

Render is allowed to be smart about presentation. It can interpolate, draw telegraphs, sort sprites, play VFX, and make the game readable.

Gameplay facts still need to come from the sim.

If something is dangerous, the sim should expose that fact. Render can color it red, pulse it, or make it dramatic. Render should not infer danger from pixels or animation timing.

The deadline shortcut is tempting:

Just check the animation frame.

That turns presentation into a hidden gameplay dependency.

In this codebase, the same rule shows up elsewhere: app routes input, game owns session meaning, sim owns encounter truth, render presents committed truth.

The boundary can move as the project changes. If it moves, it should move intentionally. Accidental authority is the bug.

The working checklist

When new state appears, I use this checklist:

Can this state change future gameplay?
  yes -> checksum/replay authority
  no  -> keep asking

Is it needed to restore and continue correctly?
  yes -> snapshot surface
  no  -> keep asking

Is it an observation of committed gameplay?
  yes -> event/debug/inspect surface
  no  -> keep asking

Is it only presentation?
  yes -> render/app state

These categories are about contracts, not importance. Render, debugging, and snapshots all matter. They just answer different questions.

Authoritative state is state the future sim can branch on.

Everything else can be useful without being replay evidence.

Why bother?

Because failures get routed to a smaller part of the code.

When something breaks, the category narrows the search:

replay checksum changed
  -> authoritative sim state, RNG, iteration order

save/load diverges after restore
  -> snapshot protocol or rebuild boundary

visual is wrong but checksum is stable
  -> render projection, sort, interpolation, VFX

event is wrong but sim is right
  -> observer/event emission

cache change affects gameplay
  -> cache rebuild contract leaked

This is the practical payoff: fewer places for bugs to hide.

Most of the code here is phase order, bounded storage, explicit hashing, narrow codecs, optional event queues, and assertions. None of that is exotic. The hard part is deciding what each piece of state is allowed to mean.

The design target is modest: debug fields should not break replay, caches should not survive as hidden sources of gameplay, and render should not become combat logic.

Not every byte gets a vote.