展示HN:Z80-μLM,一个能放入40KB的“对话式AI”
Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

原始链接: https://github.com/HarryR/z80ai

## Z80-μLM:微型AI,为老式硬件而生 Z80-μLM是一个令人惊讶的实用对话式AI,设计在仅有64kb RAM的古老Z80处理器上运行。该项目展示了AI可以变得多么小,同时仍然表现出“个性”的迹象,这通过一个40kb的.com二进制文件实现。 它通过独特的方法逐字符生成文本:输入通过三元语法哈希转换为“标签云”(允许容错),权重被大量量化为仅2位。推理依赖于高效的16位整数运算,避免浮点运算。 虽然无法进行复杂的推理,Z80-μLM包含预训练的示例,例如一个简单的聊天机器人和一个20个问题游戏。它擅长简洁、细微的回复,迫使用户提出互动、探究性的问题。项目提供了使用Ollama或Claude等LLM训练自定义模型的工具。 该项目突出了在受限硬件上运行AI的可能性,证明了即使在重大限制下,也能实现功能甚至一丝魅力。

## Z80-μLM:微型AI用于复古硬件 一位开发者创建了Z80-μLM,这是一款令人惊讶的实用对话式AI,可以放入一个40KB的.COM文件中——设计用于在Z80处理器上运行,仅需64KB的RAM!该项目探索了语言模型的极限,利用了2位量化权重和巧妙的优化,如三元组哈希和16位整数运算。 虽然无法执行复杂的任务,但该模型可以训练用于简单的游戏,如“20个问题”,并进行基本的、具有个性化的对话。成功的关键在于“量化感知训练”,在学习过程中为硬件限制做好准备。 开发者甚至使用了Claude API来生成训练数据(并带着一丝担忧,关于潜在的服务条款违规!)。这项令人印象深刻的成就表明,在足智多谋和关注极端约束的条件下,可以实现多少,为在老式硬件上实现AI的可能性提供了一个引人入胜的视角。
相关文章

原文

Z80-μLM is a 'conversational AI' that generates short character-by-character sequences, with quantization-aware training (QAT) to run on a Z80 processor with 64kb of ram.

The root behind this project was the question: how small can we go while still having personality, and can it be trained or fine-tuned easily? With easy self-hosted distribution?

The answer is Yes! And a 40kb .com binary (including inference, weights & a chat-style UI) running on a 4MHz processor from 1976.

It won't pass the Turing test, but it might make you smile at the green screen.

For insight on how to best train your own model, see TRAINING.md.

Two pre-built examples are included:

A conversational chatbot trained on casual Q&A pairs. Responds to greetings, questions about itself, and general banter with terse personality-driven answers.

> hello
HI
> are you a robot
YES
> do you dream
MAYBE

A 20 Questions game where the model knows a secret topic and answers YES/NO/MAYBE to your questions. Guess correctly to WIN.

> is it alive
YES
> is it big
YES
> does it have a trunk
YES
> is it grey
MAYBE
> elephant
WIN

Includes tools for generating training data with LLMs (Ollama or Claude API) and balancing class distributions.

  • Trigram hash encoding: Input text is hashed into 128 buckets - typo-tolerant, word-order invariant
  • 2-bit weight quantization: Each weight is {-2, -1, 0, +1}, packed 4 per byte
  • 16-bit integer inference: All math uses Z80-native 16-bit signed arithmetic
  • ~40KB .COM file: Fits in CP/M's Transient Program Area (TPA)
  • Autoregressive generation: Outputs text character-by-character
  • No floating point: Everything is integer math with fixed-point scaling
  • Interactive chat mode: Just run CHAT with no arguments

The model doesn't understand you. But somehow, it gets you.

Your input is hashed into 128 buckets via trigram encoding - an abstract "tag cloud" representation. The model responds to the shape of your input, not the exact words:

"hello there"  →  [bucket 23: 64, bucket 87: 32, ...]
"there hello"  →  [bucket 23: 64, bucket 87: 32, ...]  (same!)
"helo ther"    →  [bucket 23: 32, bucket 87: 32, ...]  (similar - typo tolerant)

This is semantically powerful for short inputs, but there's a limit: longer or order-dependent sentences blur together as concepts compete for the same buckets. "Open the door and turn on the lights" will likely be too close to distringuish from "turn on the door and open the lights."

Small Responses, Big Meaning

A 1-2 word response can convey surprising nuance:

  • OK - acknowledged, neutral
  • WHY? - questioning your premise
  • R U? - casting existential doubt
  • MAYBE - genuine uncertainty
  • AM I? - reflecting the question back

This isn't necessarily a limitation - it's a different mode of interaction. The terse responses force you to infer meaning from context or ask probing direct yes/no questions to see if it understands or not (e.g. 'are you a bot', 'are you human', 'am i human' displays logically consistent memorized answers)

  • Short, varied inputs with consistent categorized outputs
  • Fuzzy matching (typos, rephrasing, word order)
  • Personality through vocabulary choice
  • Running on constrianed 8-bit hardware
  • A chatbot that generates novel sentences
  • Something that tracks multi-turn context deeply
  • A parser that understands grammar
  • Anything approaching general intelligence

It's small, but functional. And sometimes that's exactly what you need

  • Input: 128 query trigram buckets + 128 context buckets
  • Hidden layers: Configurable depth/width, e.g., 256 → 192 → 128
  • Output: One neuron per character in charset
  • Activation: ReLU between hidden layers

The Z80 is an 8-bit CPU, but we use its 16-bit register pairs (HL, DE, BC) for activations and accumulators. Weights are packed 4-per-byte (2-bit each) and unpacked into 8-bit signed values for the multiply-accumulate.

The 16-bit accumulator gives us numerical stability (summing 256 inputs without overflow), but the model's expressiveness is still bottlenecked by the 2-bit weights, and naive training may overflow or act 'weirdly' without QAT.

The core of inference is a tight multiply-accumulate loop. Weights are packed 4-per-byte:

; Unpack 2-bit weight from packed byte
ld a, (PACKED)      ; Get packed weights
and 03h             ; Mask bottom 2 bits
sub 2               ; Map 0,1,2,3 → -2,-1,0,+1
ld (WEIGHT), a

; Rotate for next weight
ld a, (PACKED)
rrca
rrca
ld (PACKED), a

The multiply-accumulate handles the 4 possible weight values:

MULADD:
    or a
    jr z, DONE       ; weight=0: skip entirely
    jp m, NEG        ; weight<0: subtract
    ; weight=+1: add activation
    ld hl, (ACC)
    add hl, de
    ld (ACC), hl
    ret
NEG:
    cp 0FFh
    jr z, NEG1       ; weight=-1
    ; weight=-2: subtract twice
    ld hl, (ACC)
    sbc hl, de
    sbc hl, de
    ld (ACC), hl
    ret
NEG1:
    ; weight=-1: subtract once
    ld hl, (ACC)
    sbc hl, de
    ld (ACC), hl
    ret

After each layer, arithmetic right-shift by 2 to prevent overflow:

sra h        ; Shift right arithmetic (preserves sign)
rr l
sra h
rr l         ; ACC = ACC / 4

That's the entire neural network: unpack weight, multiply-accumulate, shift. Repeat ~100K times per character generated.


License: MIT or Apache-2.0 as you see fit.

联系我们 contact @ memedata.com