使用PDB调试信息反编译Xbox游戏
Decompiling Xbox games using PDB debug info

原始链接: https://i686.me/blog/csplit/

## 反编译《光环1》:一种新的Xbox逆向工程方法 最近的去编译技术集中在“匹配”上——通过逐字节比较提取的对象来重建原始代码。传统上,这涉及手动反汇编和二进制文件分割。现在,工具可以自动执行此过程,直接生成目标文件。然而,将其应用于原始Xbox游戏却具有挑战性。 该项目致力于反编译《光环1》的PAL调试版本,利用其有价值的程序数据库(PDB)文件。现有的x86分割器无法有效利用PDB,仅仅将其视为符号映射。开发了一个自定义分割器,独特地利用了PDB的“节贡献”——详细记录了目标文件最初的布局方式——来准确重建对象,即使是从一个被剥离的PDB中。 挑战包括支持过时的Visual C++ 7 beta 2调试信息格式以及处理结构化异常处理程序(SEH)。最初的构建由于不正确的字符串格式化和由编译器优化引起的错误的重定位符号化而导致运行时错误。虽然许多问题都经过手动修复,但仍然存在一些残留的崩溃,这凸显了完全重建的复杂性。 该项目仍在进行中,未来的工作将集中在集成控制流生成以及探索“匹配链接”以解决剩余错误。代码和进度已在GitHub和decomp.dev上公开。

## 黑客新闻讨论:使用PDB反编译Xbox游戏 最近一篇黑客新闻帖子引发了关于使用调试信息(PDB文件)反编译旧Xbox游戏的讨论。原作者强调了对《细胞分裂》、《神偷》、《杀出重围》等游戏的成功探索,并强调了理解这些经典游戏制作方式的价值。 对话很快转向了更广泛的软件版权问题以及开源旧游戏的益处。许多评论者认为,版权期限应该短于目前作者去世后70年的期限,理由是这限制了创造力并导致了文化遗产的丧失。人们对缺乏强制源代码托管表示担忧,这阻碍了重制工作并阻止了社区贡献。 几位用户讨论了反编译的挑战,指出将机器代码转换为可读和可重构源代码的难度。其他人指出了版权法的历史背景,最初是为易于修改的文本设计的,而现代软件则更加复杂。讨论还涉及与反编译相关的实用工具和项目,包括针对Xbox 360游戏的decomp-toolkit的一个分支。
相关文章

原文

26 January, 2026

In the world of matching decompilation, projects use a tool which consumes an input binary and lifts objects for comparison and linkage.

Doing this lets them employ a divide and conquer strategy where a game is reverse engineered, object by object, until the game is buildable, and matches on a byte or instruction level using fully original code. In the dark ages, every project started with a disassembly of its target binary. The boundaries between different object files ("splits") were determined manually through heuristics and processes of elimination.

Fast forward some years and the decompilation landscape has developed considerably. Many different open-source toolkits called splitters exist which let you establish a decompilation workflow for all sorts of targets. Code is processed by control flow generation algorithms that lift relocations. You never deal with disassemblies because the tools write object files for you.

decomp toolkit

Currently most decomps target 6th and 7th generation PowerPC console games using decomp-toolkit.

But what if we want to decompile an original Xbox game? Specifically, I'm working on the PAL debug build of Halo 1, since it has debug symbols in the form of a Program Database (PDB) file.

The vast majority of x86 reverse engineering projects load a DLL which hooks functions in the base executable. Nobody has managed instruction level matching using this, which is something I wanted. And only the debug Xbox kernels even support loading DLLs (in the form of "debugger extensions.")

Instead I started looking at splitting tools for x86, of which there were two solutions: either scripts to postprocess disassemblies or plugins for IDA Pro and Ghidra that work based on exporting the tools' symbol databases.

Section contributions

I couldn't find a tool that actually read the PDBs for splitting. The best the existing tools could do was use the PDB like a map file to be consumed by another reverse engineering tool to provide symbolisation.

Using the PDB this way meant that I'd miss out on a lot of information. The VC++ linker logs the individual sections of input COFF objects being linked into the output executable to the PDB in the form of section contributions.

These structures contain:

  • the address of the section in the binary,
  • the original size and flags (including alignment) of the section,
  • the specific object the section came from,
  • and a checksum for the section's data itself.

This is very useful in the context of splitting objects because it's a record of how all object files were laid out verbatim!

And they still exist even in stripped PDBs. In my case, I have a stripped PDB, so I don't have type info or private symbols for the target. But we can still automatically enumerate every logical piece of data and code. Compared to other tools we are no longer guessing the sizes and locations of data symbols based on auto analysis.

So I wrote my own splitter that uses the section contributions data to create objects.

My specific target uses Visual C++ 7 beta 2 on the old VC++ 2.00 debug info format which no-one implements support for, so to read the file I had to modify the Rust pdb crate. Though, the official Microsoft crate can read these now, but where's the fun in that?

Some information we don't have is names for private symbols or COMDAT data (the rules for handling duplicate definitions of symbols.) In my case I just ignored COMDAT except where it is plainly obvious that it was used like for string and floating point constant deduplication. For nameless section contributions, having their flags was useful when creating temp names because it lets you prefix symbols based on their contents, e.g. code_00401234.

Control flow generation

Every decompilation needs to identify all pointers referenced in their binaries. We have a complete list of all absolute relocations in the binary thanks to the .reloc section, but nothing about relative relocations for jmp or call.

To lift these relative relocations, you can either make a script to export from a tool like IDA or you can find them yourself using control flow generation. I opted for the latter because I wanted to keep the tool self contained.

SafeSEH

Structured Exception Handlers (SEH) are a Microsoft vendor-specific C language extension that lets you write exception handlers inside functions using __try and __catch statements.

When entering a __try statement, the compiler generates a structure containing pointers to the __catch or __finally handlers, which are blocks in the function. There's never a direct jump to these handlers, and without special treatment this is opaque to control flow generation. Hence it failed to find any catch/finally blocks. I ended up fixing about 130 handlers manually.

Booting into the game the first time it linked immediately booted me back to the dashboard.

After investigating this I found that the Xbox runtime library was failing to assign the console's active cache partition to a drive letter. szCacheDrive was being formatted to "\Device\Harddisk0\Partitin%d\", so something must have broken inside of string formatting.

xapi function

Negative relocations

This turns out to be a failure in the symbolisation of relocation targets concerning a compiler optimisation. Tracing the error to the internal libcmt function for applying format strings, _output:

  34:   80 fb 20                cmp    bl,0x20
  37:   7c 12                   jl     4b <__output+0x4b>
  39:   80 fb 78                cmp    bl,0x78
  3c:   7f 0d                   jg     4b <__output+0x4b>
  3e:   0f be c3                movsx  eax,bl
  41:   0f be 80 e0 ff ff ff    movsx  eax,BYTE PTR [eax-0x20]
                        44: dir32       ___lookuptable

___lookuptable is valid for some values above 0x20 (the first valid ASCII character.) A loop checks if the current character is in bounds, then dereferences ___lookuptable by that character minus 0x20.

Visual C++ optimises away the subtraction by applying a negative offset to the relocation. In this case the relocation points to 0x20 bytes before ___lookuptable.

RELOCATIONS #18
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000044  DIR32                      FFFFFFE0         9  ___lookuptable
 0000004F  DIR32                      00000000         9  ___lookuptable
 00000067  DIR32                      00000000        76  $L9491
 000001EE  DIR32                      00000000        6E  __pctype
 ...

This means that if you match relocation targets to symbols based on the last symbol before the target address, you are liable to generate bad relocations and probably crash the game.

movsx   eax, byte ptr ds:stru_6B5C08.HandlerFunc[eax] ; bad

Maybe some kind of heuristic could help here. But what I found good enough was to manually fix it, and after patching a few of these in libcmt and d3d8 the game made it through init.

ida debug

Unfortunately there are still some bad pointers. The game can't load into another map from the menu successfully, and if you idle long enough, the content streaming thread in the game crashes inside of a kernel export.

I haven't found the root cause for the crash. The idea is that in the future we'll get to all of these just by finding them naturally through the decompilation process.

There is also the possibility of matching linking so that bad relocations simply disappear. Hopefully there isn't undefined behaviour involved with determining the linker order.

pregame menu

You can visit the repo for Halo on GitHub and the progress tracker at decomp.dev.

There is a (pretty bad) C object writer tool that the repository is currently working off of available here. In the future, I'm going to rewrite this to include the CFG step. I swear.

联系我们 contact @ memedata.com