使用现代RTL工具构建FPGA 3dfx Voodoo
Building an FPGA 3dfx Voodoo with Modern RTL Tools

原始链接: https://noquiche.fyi/voodoo

调试RTL设计通常比设计本身更具挑战性。这段经历涉及追踪图形渲染流水线中一个微妙的像素丢失错误,最初怀疑是帧缓冲内的内存排序问题。尽管进行了广泛的调查——改变写入优先级和缓存路径——但该错误仍然存在,与最初的假设相悖。 突破来自于一个感知网表(netlist)的追踪工具(“conetrace”),它能够跟踪失败的像素穿过流水线的每个阶段,在错误到达帧缓冲*之前*揭示了错误。这表明问题不是单一的灾难性故障,而是系统范围内累积的一系列微小的不准确性。 具体来说,纹理映射过程中的精度损失、透视校正和细节层次(LOD)计算中的轻微差异,以及一个不正确的混合计算(使用扩展的与抖动减去的目的颜色)结合在一起,产生了可见的错误。每个问题本身都很小,但共同作用在特定渲染场景中产生了一个明显的错误。解决方案包括保留更宽的累加器、纠正计算并实现正确的混合方法,最终证明了有针对性的追踪工具在复杂RTL调试中的强大作用。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 使用现代RTL工具构建FPGA 3dfx Voodoo (noquiche.fyi) 9 分,由 fayalalebrun 1小时前发布 | 隐藏 | 过去的 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Describing the design is only half of RTL work. The other half is debugging it.

The bug that really sold me on this workflow showed up in translucent overlays and text. Most of the frame looked correct, but small clusters of pixels would go mysteriously missing. Because destination-color blending reads the existing framebuffer value, the obvious theory was a memory-ordering bug: a stale read, a read/write hazard, or perhaps the new fill cache occasionally returning old data.

voodoo_bug.png

Figure 3: Hardware (Mine, left) vs reference (86Box, right). The symptom looked like a framebuffer hazard: a few blended overlay pixels would be lost while most of the frame remained correct.

That theory was plausible enough that I chased it hard. I changed write priority, added a true direct no-cache path, and compared alternate-buffer reads. The artifact barely moved. That was the twist. It looked like a framebuffer hazard, but the evidence kept refusing to line up with that explanation.

This was where a netlist-aware trace helped much more than a conventional waveform viewer. Instead of staring at a large set of signals and manually aligning them across time, I used conetrace to follow the failing pixels stage by stage through the rasterizer, the TMU, the color-combine logic, and finally the framebuffer output. Once I could trace the suspect pixels end to end, the cache theory collapsed: the wrongness was already present before the framebuffer path could plausibly explain it.

Terminal

$ conetrace rv path core_1.rasterizer_1.o core_1.writeColor.i_fromPipeline --track 5241000

Annotation

1. Rasterizer

Same fragment enters both paths; the tiny W precision loss is already present.

ref: {x: 396, y: 189, W: 1.972427, S: 124.492, T: 57.031}

2. First divergence

Perspective rounding and per-pixel LOD already differ in the TMU path.

ref: {S': 63.492, T': 14.031, lod: 1, texel: 0x6B}

3. Framebuffer read

Destination color matches exactly, which rules out the cache theory.

ref: {dst565: 0x4A29}

4. Second divergence

The reference blend path effectively uses dither-subtracted destination color.

ref: {src: 0x5A8C, dst_blend: 0x49E7, out: 0x4A69}

5. Visible symptom

The RTL lands much darker than the reference by the final writeback.

ref: {final565: 0x4A69}

The real issue was not one catastrophically broken block. It was a stack of small hardware-accuracy mismatches that only became visible together.

The first problem was precision. Float-triangle `W` was being quantized too early as it passed through the TMU path. The second was that perspective texcoord rounding and per-pixel LOD adjustment were slightly off near mip boundaries. The third was in blending: I was using the expanded destination color for blend-factor math, but real Voodoo behavior effectively wants the dither-subtracted destination color instead.

Each of those behaviors was almost right in isolation. Together, on exactly the right class of blended textured primitives, they produced visibly wrong pixels. That is why the bug felt random. Most of the frame was fine, and even the failing path was only wrong in a narrow corner of the state space.

The fix was to stop arguing from the first plausible theory and instead match the machine stage by stage. I preserved wider `W`, `S`, and `T` accumulators, corrected the perspective rounding and LOD math, and fed dither-subtracted destination color into the blend-factor computation. Once those details matched the reference behavior, the "memory-ordering bug" disappeared, because it had never been a memory-ordering bug at all.

A conventional waveform viewer can show every signal involved here, but it leaves most of the reconstruction to the engineer. A netlist-aware query tool moves some of that reconstruction into the tooling itself. On a design like the Voodoo, that difference is the gap between a plausible theory and an actual explanation.

联系我们 contact @ memedata.com