展示HN：TRiP – 一个完全由我从头用C语言构建的Transformer引擎

展示HN：TRiP – 一个完全由我从头用C语言构建的Transformer引擎
Show HN: TRiP – a complete transformer engine in C built from scratch just by me

原始链接: https://github.com/carlovalenti/TRiP

## TRiP：一个从零开始的Transformer引擎 TRiP是一个轻量级、一体化的C语言引擎，用于Transformer AI模型。它历时18个月作为个人学习项目开发完成，支持Gemma、Llama 2、PaliGemma和GPT-2等模型的推理、训练、分词器创建、聊天和视觉功能。该项目的主要目标是教育性的——通过手工编码实现来深入理解Transformer的内部机制，避免使用PyTorch或TensorFlow等外部框架。虽然不旨在与llama.cpp等优化库竞争，但TRiP提供了完整的反向传播训练，并支持AdamW和各种推理采样方法。主要特性包括对SafeTensors和Karpathy格式的支持，权重类型（bf16、fp16、fp32）以及内存优化。代码的一小部分利用AI生成的组件来处理JSON解析和图像处理等任务。 TRiP专为学习和实验而设计，提供了一个清晰、注释良好的代码库，展示了Transformer架构和反向传播。它采用CC BY-NC 4.0许可，仅供非商业用途。

对不起。

A few-files, all-in-one C engine for Transformer AI models: inference, training, tokenizer creation, chat, and vision.

Built from scratch over 18 months (from March 2024 to August 2025) during my lunch breaks and weekend nights, TRiP exists just because I wanted to truly understand the transformer internals - from the matrix multiplications up.

TRiP's purpose is purely educational, for me and for anyone willing to learn about transformers. It supports Gemma 1, Llama 2, PaliGemma, and GPT-2, with full inference and training. It does not aim to track the latest model releases, and is not trying to compete with llama.cpp.

NOTE: since people are asking: here's what's AI-generated in the code:

the json parser (with some fix)
the safetensors checkpoint save function
the whole jpeg-X11 management functions (I had no interest in developing them)
the final file split (I initially wrote everything as main.c :D )
some revisions of the comments before I made the commit
this readme, for the most part :D

That's all, I think; the rest, it's all hand-coded by me. And it would had no sense otherwise, since the whole point in doing this was: to achieve the closest to a full-stack understanding of the transformer internals.

Architectures: Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2
Checkpoint formats: SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats
Weight types: bf16, float16, float32
Training: full backpropagation with AdamW, cosine annealing LR, gradient clipping
Tokenizer: BPE (SentencePiece-compatible), with vocabulary creation from scratch
Inference: greedy, top-k, and nucleus (top-p) sampling
Chat: interactive chat with Llama, Gemma, and TinyLlama chat templates
Vision: multimodal inference with PaliGemma (JPEG input, X11 display)
Memory: RAM-optimized mode via mmap for large models on limited hardware

gcc (recommended: version 13 or higher, to get support for bfloat16; with OpenMP support)
libjpeg-dev (or libjpeg62-turbo-dev)
libx11-dev

WARNING: do NOT expect higher performance with bfloat16 or float16 on CPUs; today's CPUs are not optimized for floating point operations in such formats, and float32 always performs best. That surprised me a lot, too.

On Debian:

sudo apt install build-essential libomp-dev libjpeg62-turbo-dev libx11-dev

On Ubuntu:

sudo apt install build-essential libomp-dev libjpeg-dev libx11-dev

TRiP runs natively under WSL (Windows Subsystem for Linux). To enable the X11 display features (vision mode, image display), install an X server on the Windows side:

Then in your WSL terminal, before running TRiP:

If using WSL2 (most setups), use instead:

export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0

X11 is only needed for vision mode. Chat, inference, and training work without it.

That's it. No cmake, no external frameworks, no Python. Just make.

Download a Gemma-2B-IT model from HuggingFace (safetensors format), then:

./trip --chat \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json \
    --chat_scheme GEMMA

Run inference on a prompt

./trip --decode \
    --input_text "The capital of Italy is" \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json

Or from a text file:

./trip --decode prompt.txt \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json

./trip --train \
    --checkpoint my_model/model.safetensors \
    --tokenizer my_model/tokenizer.json \
    --train_data my_dataset.txt \
    --train_config training_args.json

./trip --vision photo.jpg \
    --checkpoint paligemma/model.safetensors \
    --tokenizer paligemma/tokenizer.json \
    --input_text "Describe this image"

Build a tokenizer vocabulary from scratch

./trip --build_vocab corpus.txt --vocab_size 32000 --tokenizer my_tokenizer.json

USAGE:
  ./trip <ACTION> [OPTIONS...]

Flag	Description
`--decode [file]`	Run inference on a prompt (from file, `--input_text`, or stdin)
`--chat`	Interactive chat session
`--vision [image.jpg]`	Multimodal inference with an image
`--train`	Train the model
`--create`	Create a new model from a configuration file
`--build_vocab <data.txt>`	Build a new tokenizer vocabulary from a text corpus
`--utest`	Run unit tests
`--help`	Show help

Model & tokenizer options

Flag	Default	Description
`--checkpoint <path>`	`default.model`	Path to model checkpoint file(s)
`--checkpoint_type <type>`	`SAFETENSORS`	Format: `SAFETENSORS`, `LLAMA2_AK`, `GPT2_AK`
`--configuration <path>`	(auto)	Path to `config.json` (for SafeTensors)
`--tokenizer <path>`	`default.tokenizer`	Path to tokenizer file
`--tokenizer_format <type>`	`JSON_HUGGINGFACE`	Format: `JSON_HUGGINGFACE`, `LLAMA2_AK`, `GPT2_AK`
`--tokenizer_type <type>`	`SENTENCEPIECE`	Algorithm: `SENTENCEPIECE`, `TRIP`

Inference & sampling options

Flag	Default	Description
`--input_text "<prompt>"`	—	Provide prompt text directly on the command line
`--system_prompt "<text>"`	—	System prompt for chat mode
`--chat_scheme <scheme>`	(none)	Chat template: `LLAMA`, `TINY_LLAMA`, `GEMMA`
`--chat_save_context <file>`	—	Pre-process and save chat context for faster startup
`--chat_load_context <file>`	—	Load a previously saved chat context
`--temperature <value>`	`1.0`	Sampling temperature. `0.0` = greedy (always pick the most probable token)
`--top_p <value>`	`0.9`	Nucleus sampling: sample from the smallest set of tokens whose cumulative probability exceeds this value
`--top_k <value>`	(disabled)	Top-k sampling: sample from the k most probable tokens
`--ram`	(off)	Memory-map weights instead of loading them (slower, uses less RAM)

Flag	Default	Description
`--train_config <path>`	`training_args.json`	Path to training configuration JSON
`--train_data <path>`	`training_data.txt`	Path to training data (plain text)

TRiP is organized into 7 files. Open trip.h for the complete map.

File	Lines	What it contains
`trip.h`	~900	The map. Every type, struct, global, and declaration.
`math.c`	~3000	Tensor ops, each forward+backward paired side by side: matmul, softmax, layernorm, RMSnorm, RoPE, attention, FFN activations, vector arithmetic
`forward.c`	~1500	Forward pass orchestration + token sampling
`backward.c`	~1500	Backward pass + AdamW optimizer + gradient management
`model.c`	~5500	Checkpoint I/O, model init, memory management, tokenizer, vision preprocessing
`utils.c`	~1000	Logging, JSON parser, terminal I/O, JPEG/X11 image handling
`main.c`	~1900	CLI argument parsing, chat loop, training loop, inference loop

How it works (for the curious)

TRiP implements a transformer from first principles in C. No PyTorch, no TensorFlow, no ONNX — just linear algebra on arrays of floats.

The residual stream is the central concept: a vector that flows through the model like data on a bus. Each layer reads from it, processes it through attention and a feed-forward network, and writes back to it. The forward pass walks the layers top to bottom; the backward pass walks them bottom to top, computing gradients via the chain rule.

Every math operation (math.c) is implemented as a forward+backward pair: you can read rmsnorm() and immediately below it rmsnorm_backward(), and see exactly how the gradient flows through the same computation in reverse.

I put a lot of comments in the code, both as reminders to me, and to render TRiP basically an annotated school book about transformers.

For a deeper understanding of backpropagation, see Andrej Karpathy's lecture; TRiP would never have existed without his work.

CC BY-NC 4.0 — free to use, study, modify, and share for non-commercial purposes, with attribution. For commercial licensing, contact the author.

Andrej Karpathy — for llama2.c, nanoGPT, and the lectures that made all of this possible
Google — for releasing the Gemma model family
Meta — for releasing the Llama model family