展示HN：我在PS2上运行了一个语言模型

展示HN：我在PS2上运行了一个语言模型
Show HN: I ran a language model on a PS2

原始链接: https://github.com/xaskasdf/ps2-llm

## PS2 上的 LLM：复古 AI 壮举一位开发者成功在 PlayStation 2 上运行了一个大型语言模型 (LLM)，这款主机已有 26 年历史，证明了即使在硬件限制下也是可行的。这是通过从头开始构建一个定制的 PS2 SDK，并巧妙地从 CD-ROM 流式传输模型权重来实现的，从而绕过了主机有限的 32MB 内存。当前默认模型“brandon-tiny-10m-instruct”（10M 参数，Q8 精度）使用定制的量化格式 (PSNT) 来最小化尺寸。更大的模型（高达 77MB）也可以运行，但会增加 CD-ROM 的访问次数。该项目依赖于仅将必要的数据——激活值、KV 缓存和 token 嵌入——在推理期间流式传输到 RAM 中。提供了 Python 工具，用于将模型从 Hugging Face 格式转换为 PS2 兼容的 PSNT 格式，支持三元、Q4 和 Q8 等量化级别。完整的项目，包括源代码和模型细节，可在 [naranjositos.tech](https://naranjositos.tech) 找到。

一位开发者成功地在PlayStation 2上运行了一个1000万参数的语言模型，这项成就此前被认为是不可能的，因为该主机的内存仅有32MB。关键在于从CD-ROM中逐个矩阵地流式传输模型权重，在处理过程中只将激活等必要数据保存在RAM中。为了实现这一目标，开发者创建了一种名为PSNT的定制量化格式，重建了PS2 SDK的部分内容，甚至专门为该硬件训练了一个模型。PSNT格式优先考虑与PS2的限制兼容性，而不是为了减小尺寸而进行激进的量化，从而导致与FP16相比，质量损失可以忽略不计。虽然每秒令牌数仍在完善中，但该项目展示了一种巧妙的硬件限制规避方法，反映了现代边缘推理中使用的技术。开发者计划分享每个处理阶段更详细的计时细分。

原文

Running a large language model on a PlayStation 2.

This project started as an experiment born from two passions: retrogaming and LLMs. Having built a complete PS2 SDK from scratch (including tools that had to be rewritten due to incompatibilities with modern software and hardware), and having extensive experience working with language models, the question after seeing a team run an LLM on a Windows 98 PC was simple: "Can I run this on a 26-year-old game console?"

The answer is yes. The PS2's Emotion Engine (MIPS-III @ 294 MHz, 32 MB RAM) can run transformer inference by streaming model weights from CD-ROM one matrix at a time, keeping only activations and KV cache in memory. The current default model is brandon-tiny-10m-instruct, a custom 10M-parameter architecture running at Q8 precision.

Website: naranjositos.tech

The PS2 has 32 MB of RAM total. Model weights don't need to fit in memory -- the inference engine streams them from CD-ROM one matrix at a time during the forward pass. Only activations, KV cache, token embeddings, and RMS norms stay in RAM.

This means models much larger than 32 MB can run on the console. A 77 MB model works -- it just reads more from CD. See MODELS.md for details on all models tested.

Several models were tested during development. The current default is brandon-tiny-10m (Q8, ~10.4 MB), chosen for its balance of speed and coherence on PS2 hardware.

See MODELS.md for detailed specs, conversion pipelines, and guidance on adding new models.

Requires the ps2_biw_engine SDK at ../ps2_biw_engine.

Output: build/ps2_llm_demo.elf (ELF) and build/ps2_llm_demo.iso (bootable CD image).

Requires Python with numpy and torch:

# Brandon model (custom architecture): safetensors -> PSNT v3 Q8
python3 tools/brandon_to_psnt.py --quant q8 \
    third_party/brandon-tiny/model.safetensors \
    cd_rom/DATA/LLM/brandon-q8.psnt

# Standard HuggingFace models: HF -> llama2.c float32 -> PSNT
python3 tools/hf_to_llama2c.py third_party/model-dir/ model.bin
python3 tools/q4_quantize.py model.bin model.psnt          # Q4
python3 tools/psnt_quantize.py model.bin model.psnt         # Ternary

# SentencePiece tokenizer -> binary
python3 tools/sp_tokenizer_to_llama2c.py \
    third_party/brandon-tiny/tokenizer.model \
    cd_rom/DATA/LLM/tok8k.bin

Weights use the PSNT (PS Net) binary format, a compact quantized format designed for the PS2's constraints. Supports ternary (2-bit), Q4 (4-bit), and Q8 (8-bit) quantization. See PSNT.md for the full specification.

game_main.c -- PS2 entry point, UI, controller input, CD-ROM streaming
llama_ps2.c -- Self-contained LLM inference engine (included inline by game_main.c)
game_scene.c -- Engine scene callback stubs
tools/ -- Python conversion and verification scripts
cd_rom/ -- Runtime data (models, tokenizers, IOP modules) burned to CD image
PSNT.md -- Model format specification
MODELS.md -- Model support details and history

See individual model licenses on their HuggingFace pages.

展示HN：我在PS2上运行了一个语言模型 Show HN: I ran a language model on a PS2

展示HN：我在PS2上运行了一个语言模型
Show HN: I ran a language model on a PS2