展示HN:TRiP – 一个完全由我从头用C语言构建的Transformer引擎
Show HN: TRiP – a complete transformer engine in C built from scratch just by me

原始链接: https://github.com/carlovalenti/TRiP

## TRiP:一个从零开始的Transformer引擎 TRiP是一个轻量级、一体化的C语言引擎,用于Transformer AI模型。它历时18个月作为个人学习项目开发完成,支持Gemma、Llama 2、PaliGemma和GPT-2等模型的推理、训练、分词器创建、聊天和视觉功能。 该项目的主要目标是教育性的——通过手工编码实现来深入理解Transformer的内部机制,避免使用PyTorch或TensorFlow等外部框架。虽然不旨在与llama.cpp等优化库竞争,但TRiP提供了完整的反向传播训练,并支持AdamW和各种推理采样方法。 主要特性包括对SafeTensors和Karpathy格式的支持,权重类型(bf16、fp16、fp32)以及内存优化。代码的一小部分利用AI生成的组件来处理JSON解析和图像处理等任务。 TRiP专为学习和实验而设计,提供了一个清晰、注释良好的代码库,展示了Transformer架构和反向传播。它采用CC BY-NC 4.0许可,仅供非商业用途。

对不起。
相关文章

原文

A few-files, all-in-one C engine for Transformer AI models: inference, training, tokenizer creation, chat, and vision.

Built from scratch over 18 months (from March 2024 to August 2025) during my lunch breaks and weekend nights, TRiP exists just because I wanted to truly understand the transformer internals - from the matrix multiplications up.

TRiP's purpose is purely educational, for me and for anyone willing to learn about transformers. It supports Gemma 1, Llama 2, PaliGemma, and GPT-2, with full inference and training. It does not aim to track the latest model releases, and is not trying to compete with llama.cpp.

NOTE: since people are asking: here's what's AI-generated in the code:

  • the json parser (with some fix)
  • the safetensors checkpoint save function
  • the whole jpeg-X11 management functions (I had no interest in developing them)
  • the final file split (I initially wrote everything as main.c :D )
  • some revisions of the comments before I made the commit
  • this readme, for the most part :D

That's all, I think; the rest, it's all hand-coded by me. And it would had no sense otherwise, since the whole point in doing this was: to achieve the closest to a full-stack understanding of the transformer internals.

  • Architectures: Llama2, Gemma 1.0/1.1, PaliGemma 1 (vision+language), GPT-2
  • Checkpoint formats: SafeTensors (HuggingFace), Karpathy's llama2.c and gpt2 formats
  • Weight types: bf16, float16, float32
  • Training: full backpropagation with AdamW, cosine annealing LR, gradient clipping
  • Tokenizer: BPE (SentencePiece-compatible), with vocabulary creation from scratch
  • Inference: greedy, top-k, and nucleus (top-p) sampling
  • Chat: interactive chat with Llama, Gemma, and TinyLlama chat templates
  • Vision: multimodal inference with PaliGemma (JPEG input, X11 display)
  • Memory: RAM-optimized mode via mmap for large models on limited hardware
gcc (recommended: version 13 or higher, to get support for bfloat16; with OpenMP support)
libjpeg-dev (or libjpeg62-turbo-dev)
libx11-dev

WARNING: do NOT expect higher performance with bfloat16 or float16 on CPUs; today's CPUs are not optimized for floating point operations in such formats, and float32 always performs best. That surprised me a lot, too.

On Debian:

sudo apt install build-essential libomp-dev libjpeg62-turbo-dev libx11-dev

On Ubuntu:

sudo apt install build-essential libomp-dev libjpeg-dev libx11-dev

TRiP runs natively under WSL (Windows Subsystem for Linux). To enable the X11 display features (vision mode, image display), install an X server on the Windows side:

Then in your WSL terminal, before running TRiP:

If using WSL2 (most setups), use instead:

export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0

X11 is only needed for vision mode. Chat, inference, and training work without it.

That's it. No cmake, no external frameworks, no Python. Just make.

Download a Gemma-2B-IT model from HuggingFace (safetensors format), then:

./trip --chat \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json \
    --chat_scheme GEMMA

Run inference on a prompt

./trip --decode \
    --input_text "The capital of Italy is" \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json

Or from a text file:

./trip --decode prompt.txt \
    --checkpoint gemma-2b-it/model.safetensors \
    --tokenizer gemma-2b-it/tokenizer.json
./trip --train \
    --checkpoint my_model/model.safetensors \
    --tokenizer my_model/tokenizer.json \
    --train_data my_dataset.txt \
    --train_config training_args.json
./trip --vision photo.jpg \
    --checkpoint paligemma/model.safetensors \
    --tokenizer paligemma/tokenizer.json \
    --input_text "Describe this image"

Build a tokenizer vocabulary from scratch

./trip --build_vocab corpus.txt --vocab_size 32000 --tokenizer my_tokenizer.json
USAGE:
  ./trip <ACTION> [OPTIONS...]
Flag Description
--decode [file] Run inference on a prompt (from file, --input_text, or stdin)
--chat Interactive chat session
--vision [image.jpg] Multimodal inference with an image
--train Train the model
--create Create a new model from a configuration file
--build_vocab <data.txt> Build a new tokenizer vocabulary from a text corpus
--utest Run unit tests
--help Show help

Model & tokenizer options

Flag Default Description
--checkpoint <path> default.model Path to model checkpoint file(s)
--checkpoint_type <type> SAFETENSORS Format: SAFETENSORS, LLAMA2_AK, GPT2_AK
--configuration <path> (auto) Path to config.json (for SafeTensors)
--tokenizer <path> default.tokenizer Path to tokenizer file
--tokenizer_format <type> JSON_HUGGINGFACE Format: JSON_HUGGINGFACE, LLAMA2_AK, GPT2_AK
--tokenizer_type <type> SENTENCEPIECE Algorithm: SENTENCEPIECE, TRIP

Inference & sampling options

Flag Default Description
--input_text "<prompt>" Provide prompt text directly on the command line
--system_prompt "<text>" System prompt for chat mode
--chat_scheme <scheme> (none) Chat template: LLAMA, TINY_LLAMA, GEMMA
--chat_save_context <file> Pre-process and save chat context for faster startup
--chat_load_context <file> Load a previously saved chat context
--temperature <value> 1.0 Sampling temperature. 0.0 = greedy (always pick the most probable token)
--top_p <value> 0.9 Nucleus sampling: sample from the smallest set of tokens whose cumulative probability exceeds this value
--top_k <value> (disabled) Top-k sampling: sample from the k most probable tokens
--ram (off) Memory-map weights instead of loading them (slower, uses less RAM)
Flag Default Description
--train_config <path> training_args.json Path to training configuration JSON
--train_data <path> training_data.txt Path to training data (plain text)

TRiP is organized into 7 files. Open trip.h for the complete map.

File Lines What it contains
trip.h ~900 The map. Every type, struct, global, and declaration.
math.c ~3000 Tensor ops, each forward+backward paired side by side: matmul, softmax, layernorm, RMSnorm, RoPE, attention, FFN activations, vector arithmetic
forward.c ~1500 Forward pass orchestration + token sampling
backward.c ~1500 Backward pass + AdamW optimizer + gradient management
model.c ~5500 Checkpoint I/O, model init, memory management, tokenizer, vision preprocessing
utils.c ~1000 Logging, JSON parser, terminal I/O, JPEG/X11 image handling
main.c ~1900 CLI argument parsing, chat loop, training loop, inference loop

How it works (for the curious)

TRiP implements a transformer from first principles in C. No PyTorch, no TensorFlow, no ONNX — just linear algebra on arrays of floats.

The residual stream is the central concept: a vector that flows through the model like data on a bus. Each layer reads from it, processes it through attention and a feed-forward network, and writes back to it. The forward pass walks the layers top to bottom; the backward pass walks them bottom to top, computing gradients via the chain rule.

Every math operation (math.c) is implemented as a forward+backward pair: you can read rmsnorm() and immediately below it rmsnorm_backward(), and see exactly how the gradient flows through the same computation in reverse.

I put a lot of comments in the code, both as reminders to me, and to render TRiP basically an annotated school book about transformers.

For a deeper understanding of backpropagation, see Andrej Karpathy's lecture; TRiP would never have existed without his work.

CC BY-NC 4.0 — free to use, study, modify, and share for non-commercial purposes, with attribution. For commercial licensing, contact the author.

  • Andrej Karpathy — for llama2.c, nanoGPT, and the lectures that made all of this possible
  • Google — for releasing the Gemma model family
  • Meta — for releasing the Llama model family
联系我们 contact @ memedata.com