使用现代RTL工具构建FPGA 3dfx Voodoo

使用现代RTL工具构建FPGA 3dfx Voodoo
Building an FPGA 3dfx Voodoo with Modern RTL Tools

调试RTL设计通常比设计本身更具挑战性。这段经历涉及追踪图形渲染流水线中一个微妙的像素丢失错误，最初怀疑是帧缓冲内的内存排序问题。尽管进行了广泛的调查——改变写入优先级和缓存路径——但该错误仍然存在，与最初的假设相悖。突破来自于一个感知网表（netlist）的追踪工具（“conetrace”），它能够跟踪失败的像素穿过流水线的每个阶段，在错误到达帧缓冲*之前*揭示了错误。这表明问题不是单一的灾难性故障，而是系统范围内累积的一系列微小的不准确性。具体来说，纹理映射过程中的精度损失、透视校正和细节层次（LOD）计算中的轻微差异，以及一个不正确的混合计算（使用扩展的与抖动减去的目的颜色）结合在一起，产生了可见的错误。每个问题本身都很小，但共同作用在特定渲染场景中产生了一个明显的错误。解决方案包括保留更宽的累加器、纠正计算并实现正确的混合方法，最终证明了有针对性的追踪工具在复杂RTL调试中的强大作用。

## FPGA Voodoo 3dfx 复刻项目一个利用现代FPGA RTL工具重建3dfx Voodoo显卡的项目正在Hacker News上引发讨论。作者详细介绍了他们的进展，重点关注重新实现该卡功能所面临的挑战。评论者们回忆了Voodoo的影响，特别是它在当时令人印象深刻的性能，以及在Glide支持下《Quake 3》和《Screamer 2》等游戏独特的视觉效果。一些人回忆起在早期Linux发行版上运行该卡的困难，强调了爱好者们所需的投入。讨论还涉及项目的技术方面，包括使用现代工具以及作者关于寄存器实现的設計选择。关于作者的方法是否最优，存在争论，并提出了替代架构的建议。一个反复出现的主题是对独特硬件品牌和PC游戏技术激动人心的时代的怀旧。也有人注意到LLM生成文本的普遍性及其对在线内容的影响。

原文

Describing the design is only half of RTL work. The other half is debugging it.

The bug that really sold me on this workflow showed up in translucent overlays and text. Most of the frame looked correct, but small clusters of pixels would go mysteriously missing. Because destination-color blending reads the existing framebuffer value, the obvious theory was a memory-ordering bug: a stale read, a read/write hazard, or perhaps the new fill cache occasionally returning old data.

Figure 3: Hardware (Mine, left) vs reference (86Box, right). The symptom looked like a framebuffer hazard: a few blended overlay pixels would be lost while most of the frame remained correct.

That theory was plausible enough that I chased it hard. I changed write priority, added a true direct no-cache path, and compared alternate-buffer reads. The artifact barely moved. That was the twist. It looked like a framebuffer hazard, but the evidence kept refusing to line up with that explanation.

This was where a netlist-aware trace helped much more than a conventional waveform viewer. Instead of staring at a large set of signals and manually aligning them across time, I used conetrace to follow the failing pixels stage by stage through the rasterizer, the TMU, the color-combine logic, and finally the framebuffer output. Once I could trace the suspect pixels end to end, the cache theory collapsed: the wrongness was already present before the framebuffer path could plausibly explain it.

Terminal

$ conetrace rv path core_1.rasterizer_1.o core_1.writeColor.i_fromPipeline --track 52410001core_1.rasterizer_1.o @ cycle 5241000 payload={x: 396, y: 189, W: 1.972412, S: 124.492, T: 57.031}2-> core_1.tmu_1.io_output @ cycle 5241001 payload={S': 63.469, T': 13.984, lod: 0, texel: 0x58}3-> core_1.fbAccess.read @ cycle 5241002 payload={dst565: 0x4A29}4-> core_1.colorCombine_1.o @ cycle 5241002 payload={src: 0x56C9, dst_blend: 0x4A29, out: 0x39E7}5-> core_1.writeColor.i_fromPipeline @ cycle 5241003 payload={final565: 0x39E7}

Annotation

1. Rasterizer

Same fragment enters both paths; the tiny W precision loss is already present.

ref: {x: 396, y: 189, W: 1.972427, S: 124.492, T: 57.031}

2. First divergence

Perspective rounding and per-pixel LOD already differ in the TMU path.

ref: {S': 63.492, T': 14.031, lod: 1, texel: 0x6B}

3. Framebuffer read

Destination color matches exactly, which rules out the cache theory.

ref: {dst565: 0x4A29}

4. Second divergence

The reference blend path effectively uses dither-subtracted destination color.

ref: {src: 0x5A8C, dst_blend: 0x49E7, out: 0x4A69}

5. Visible symptom

The RTL lands much darker than the reference by the final writeback.

ref: {final565: 0x4A69}

The real issue was not one catastrophically broken block. It was a stack of small hardware-accuracy mismatches that only became visible together.

The first problem was precision. Float-triangle `W` was being quantized too early as it passed through the TMU path. The second was that perspective texcoord rounding and per-pixel LOD adjustment were slightly off near mip boundaries. The third was in blending: I was using the expanded destination color for blend-factor math, but real Voodoo behavior effectively wants the dither-subtracted destination color instead.

Each of those behaviors was almost right in isolation. Together, on exactly the right class of blended textured primitives, they produced visibly wrong pixels. That is why the bug felt random. Most of the frame was fine, and even the failing path was only wrong in a narrow corner of the state space.

The fix was to stop arguing from the first plausible theory and instead match the machine stage by stage. I preserved wider `W`, `S`, and `T` accumulators, corrected the perspective rounding and LOD math, and fed dither-subtracted destination color into the blend-factor computation. Once those details matched the reference behavior, the "memory-ordering bug" disappeared, because it had never been a memory-ordering bug at all.

A conventional waveform viewer can show every signal involved here, but it leaves most of the reconstruction to the engineer. A netlist-aware query tool moves some of that reconstruction into the tooling itself. On a design like the Voodoo, that difference is the gap between a plausible theory and an actual explanation.

使用现代RTL工具构建FPGA 3dfx Voodoo Building an FPGA 3dfx Voodoo with Modern RTL Tools

使用现代RTL工具构建FPGA 3dfx Voodoo
Building an FPGA 3dfx Voodoo with Modern RTL Tools