Running a large language model on a PlayStation 2.
This project started as an experiment born from two passions: retrogaming and LLMs. Having built a complete PS2 SDK from scratch (including tools that had to be rewritten due to incompatibilities with modern software and hardware), and having extensive experience working with language models, the question after seeing a team run an LLM on a Windows 98 PC was simple: "Can I run this on a 26-year-old game console?"
The answer is yes. The PS2's Emotion Engine (MIPS-III @ 294 MHz, 32 MB RAM) can run transformer inference by streaming model weights from CD-ROM one matrix at a time, keeping only activations and KV cache in memory. The current default model is brandon-tiny-10m-instruct, a custom 10M-parameter architecture running at Q8 precision.
Website: naranjositos.tech
The PS2 has 32 MB of RAM total. Model weights don't need to fit in memory -- the inference engine streams them from CD-ROM one matrix at a time during the forward pass. Only activations, KV cache, token embeddings, and RMS norms stay in RAM.
This means models much larger than 32 MB can run on the console. A 77 MB model works -- it just reads more from CD. See MODELS.md for details on all models tested.
Several models were tested during development. The current default is brandon-tiny-10m (Q8, ~10.4 MB), chosen for its balance of speed and coherence on PS2 hardware.
See MODELS.md for detailed specs, conversion pipelines, and guidance on adding new models.
Requires the ps2_biw_engine SDK at ../ps2_biw_engine.
Output: build/ps2_llm_demo.elf (ELF) and build/ps2_llm_demo.iso (bootable CD image).
Requires Python with numpy and torch:
# Brandon model (custom architecture): safetensors -> PSNT v3 Q8
python3 tools/brandon_to_psnt.py --quant q8 \
third_party/brandon-tiny/model.safetensors \
cd_rom/DATA/LLM/brandon-q8.psnt
# Standard HuggingFace models: HF -> llama2.c float32 -> PSNT
python3 tools/hf_to_llama2c.py third_party/model-dir/ model.bin
python3 tools/q4_quantize.py model.bin model.psnt # Q4
python3 tools/psnt_quantize.py model.bin model.psnt # Ternary
# SentencePiece tokenizer -> binary
python3 tools/sp_tokenizer_to_llama2c.py \
third_party/brandon-tiny/tokenizer.model \
cd_rom/DATA/LLM/tok8k.binWeights use the PSNT (PS Net) binary format, a compact quantized format designed for the PS2's constraints. Supports ternary (2-bit), Q4 (4-bit), and Q8 (8-bit) quantization. See PSNT.md for the full specification.
game_main.c-- PS2 entry point, UI, controller input, CD-ROM streamingllama_ps2.c-- Self-contained LLM inference engine (included inline by game_main.c)game_scene.c-- Engine scene callback stubstools/-- Python conversion and verification scriptscd_rom/-- Runtime data (models, tokenizers, IOP modules) burned to CD imagePSNT.md-- Model format specificationMODELS.md-- Model support details and history
See individual model licenses on their HuggingFace pages.
