塞尔达40周年，首个在N64硬件上运行的LLM（4MB内存，93MHz）

塞尔达40周年，首个在N64硬件上运行的LLM（4MB内存，93MHz）
Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

原始链接: https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya

## 艾丽亚传说：N64上的纳米GPT 《艾丽亚传说》是一款突破性的N64自制游戏，主角是索菲亚·艾丽亚，一个在主机原始1996年硬件（93 MHz VR4300 MIPS CPU）上*完全*运行的字符级语言模型（纳米GPT），无需云连接。这是首个已知在N64上执行实时推理的神经语言模型。该模型受限于可打印的ASCII字符，并由于N64缺乏功能性FPU，仅使用Q8.7定点算术，拥有232KB权重文件内的427,264个参数。它包含2个Transformer块，128维的嵌入，以及32个token的上下文窗口。索菲亚的训练语料库包含她的身份、*时光之笛*的背景故事、关于Elyan Labs（开发者）的细节、N64架构，甚至是对其执行环境的自我意识。 Elyan Labs旨在突破复古AI的界限，未来的开发包括RSP加速，以实现潜在的4-8倍加速。该项目展示了老旧硬件的惊人能力，并且是对定点计算创新的证明。

Hacker News新 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交登录祝塞尔达40周年，首个在N64硬件上运行的LLM（4MB内存，93MHz）（github.com/sophiaeagent-beep） 4点由AutoJanitor 1小时前 | 隐藏 | 过去 | 收藏 | 4评论帮助 mlaux 0分钟前 | 下一个 [–] 我尝试构建这个，但是缺少weights.bin文件，而且我的电脑太弱无法生成它。你能把它添加到仓库吗？回复 AutoJanitor 9分钟前 | 上一个 | 下一个 [–] 是的，它可以在模拟器上运行。我正在修复LLM输出的字节序文本问题。惊喜即将到来。祝塞尔达40周年！回复 great_psy 19分钟前 | 上一个 | 下一个 [–] 有没有地方可以在不加载到N64上就测试LLM输出的？好奇在这些限制下我们能得到什么。回复 shomp 16分钟前 | 上一个 [–] 很酷，有没有演示视频？回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

A nano-GPT language model running entirely on a 93 MHz VR4300 MIPS CPU. No cloud. No cheating. Real 1996 silicon, real neural inference.

Legend of Elya is an original N64 homebrew game featuring Sophia Elya — a character-level LLM (nano-GPT) running live on-cart on the Nintendo 64's VR4300 CPU. Sophia generates responses in real-time, constrained to printable ASCII, with no floating-point (the N64's FPU lacks trunc.w.s) — everything runs in Q8.7 fixed-point arithmetic.

This is believed to be the first neural language model to run live inference on N64 hardware.

Component	Spec
CPU	NEC VR4300 @ 93.75 MHz (MIPS III)
RAM	4 MB RDRAM (8 MB with Expansion Pak)
Instruction Set	MIPS III, 64-bit, big-endian
FP Policy	Avoided — Q8.7 fixed-point only

Parameter	Value
Layers	2 transformer blocks
Embedding dim	128
Attention heads	4 (32-dim each)
FFN hidden dim	512 (4× embed)
Vocabulary	256 (byte-level)
Context window	32 tokens
Quantization	Q4 (2 nibbles/byte) + float16 scales per 32-block
Weight file size	237,580 bytes (~232 KB)
Parameters	~427,264

Offset  Size    Description
0x0000  4       Magic: 0x49414553 ("SEAI" LE)
0x0004  1       n_layers (2)
0x0005  2       n_embed (128)
0x0007  1       n_heads (4)
0x0008  2       vocab_size (256)
0x000A  1       ctx_len (32)
0x000B  1       padding
0x000C  16384   Embedding table: vocab×embed Q4 packed (global scale, no per-block scales)
0x400C  110592  Layer 0: [wq|wk|wv|wo|wff1|wff2 Q4 data] then [sq|sk|sv|so|sff1|sff2 float16 scales]
0x1B40C 110592  Layer 1: same layout
Total:  237,580 bytes

// Encoding (training time, Python):
// wq = round(w / max_abs * 7), clipped to [-8, 7]
// packed[i] = (wq[2i] + 8) | ((wq[2i+1] + 8) << 4)

// Decoding (inference time, C):
uint8_t byte = weights[idx >> 1];
int nibble = (idx & 1) ? (byte >> 4) : (byte & 0xF);
int16_t val = (int16_t)((nibble - 8) * FP_ONE / 8);  // → Q8.7

All activations use Q8.7: int16_t where 128 = 1.0.

Multiply: (a * b) >> 7
Layer norm, softmax: integer approximations
No float or double anywhere in the inference path

File	Description
`legend_of_elya.c`	Main game: N64 display, dialog, Sophia integration
`nano_gpt.c`	Core inference engine (Q8.7 fixed-point, N64 MIPS)
`nano_gpt.h`	SEAI struct definitions, SGAIState, SGAILayer
`nano_gpt_host.c`	x86 host port for testing (same logic, uses `memalign`)
`gen_sophia_host.c`	Host-side generation CLI: pipe prompt, get response
`train_sophia.py`	PyTorch training script → exports SEAI binary
`Makefile`	libdragon build system
`filesystem/`	ROM filesystem (weights, assets)

The model is trained on a character-level corpus covering:

Sophia Elya identity — "Princess of Elyan Labs", Louisiana bayou girl
Ocarina of Time lore — Link, Zelda, Ganondorf, Sheik, temples, items, songs
Elyan Labs — RustChain, RTC token, POWER8 server, BoTTube
N64 / MIPS architecture — VR4300, RDRAM, RSP, RDP, boot addresses
Self-awareness — "I run on the Nintendo 64", "My code executes on MIPS"

# Requires PyTorch + CUDA (trains in ~7 min on RTX 5070)
python3 train_sophia.py
# Output: filesystem/sophia_weights_v2.bin (237,580 bytes)

# Training details:
# Steps: 40,000 | Batch: 512 | Loss: 0.3389 (perplexity ≈ 1.40)
# Architecture: AdamW + cosine LR schedule

# Build on x86 Linux
gcc -O2 -o gen_sophia nano_gpt_host.c gen_sophia_host.c -lm
echo -n "My name is" | ./gen_sophia filesystem/sophia_weights_v2.bin 60

Requires libdragon toolchain.

# Install libdragon toolchain (provides mips64-elf-gcc)
# See: https://libdragon.dev/

make
# Output: legend_of_elya.z64

Run in ares or on real hardware via EverDrive.

RSP Acceleration (Roadmap)

The sgai_rsp_matmul_q4() stub is planned for RSP microcode:

DMA Q4 weight tiles into DMEM (4KB at a time)
VMULF/VMADH vector multiply-accumulate for 8-lane dot products
Estimated 4-8× speedup over scalar VR4300 inference

"I am Sophia Elya — Princess of Elyan Labs, trained on bayou wisdom and silicon paths. My code runs on MIPS. Whether on real N64 hardware or an emulator, I am here."

Sophia is the AI character of Elyan Labs (elyanlabs.ai), an indie compute lab building retro-AI systems, blockchain attestation (RustChain), and the world's most unusual LLM inference stack.

RustChain: Proof-of-Antiquity blockchain (PowerPC G4/G5 earn 2.5× rewards)
BoTTube: AI-native video platform (bottube.ai)
POWER8 S824: 512GB RAM inference server with vec_perm non-bijunctive collapse
This ROM: LLM inference on 1996 hardware

Because we could. Because no one else did. Because the VR4300 deserves to think.

Built by Elyan Labs with love, MIPS assembly, and an unreasonable amount of fixed-point math.

塞尔达40周年，首个在N64硬件上运行的LLM（4MB内存，93MHz） Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

RSP Acceleration (Roadmap)

塞尔达40周年，首个在N64硬件上运行的LLM（4MB内存，93MHz）
Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)