Flux 2 Klein 纯 C 推理

Flux 2 Klein 纯 C 推理
Flux 2 Klein pure C inference

原始链接: https://github.com/antirez/flux2.c

## FLUX.2-klein-4B：纯C语言的AI图像生成 FLUX.2-klein-4B是一个文本生成图像和图像生成图像的程序，完全用C语言编写，仅需要C标准库（可选MPS/BLAS加速）。由Salvatore开发，它展示了AI辅助代码生成——整个代码库由Claude Code创建，人工干预最小。该程序使用Black Forest Labs的FLUX.2-klein-4B模型，接受文本提示（以及可选的输入图像）来生成视觉效果。它绕过了典型的Python/PyTorch依赖，旨在提高开放模型使用的可访问性和自由度。主要特性包括：零依赖，可选BLAS加速以提高速度（约30倍），Apple Silicon上的Metal GPU加速，内置Qwen3-4B文本编码器，以及通过自动释放编码器实现内存效率。它支持高达1024x1024分辨率的图像生成。用户可以轻松地将该库集成到他们的C/C++项目中。该模型（约16GB）通过HuggingFace下载，并直接使用safetensors文件运行，无需量化或转换步骤。

## Flux 2 Klein：AI 生成的 C 语言推断 Salvatore Sanfilippo（Redis 作者）详细介绍了他使用 Claude Opus 将 Flux 2 Klein 图像生成管道用纯 C 语言重新实现的实验，旨在实现可访问性，无需 Python 依赖。他发现，向 AI 提供详细的规范并建立持续的反馈循环——包括图像验证——是成功的关键。该项目突出了 LLM 生成复杂代码的潜力，同时也强调了人类指导和设计选择的重要性。虽然生成的 C 语言实现目前比其 Python 版本慢（大约慢 10 倍），但它证明了 AI 辅助低级开发的可能性。这场讨论引发了关于 LLM 在软件创建中的作用、与 AI 训练数据相关的版权问题以及性能与可访问性之间的权衡的争论。许多评论者分享了类似项目的经验以及有效利用 LLM 进行代码生成的策略，包括详细规范、迭代改进和测试的重要性。作者还透露该项目 Claude 订阅费用约为 2.60 美元。

原文

This program generates images from text prompts (and optionally from other images) using the FLUX.2-klein-4B model from Black Forest Labs. It can be used as a library as well, and is implemented entirely in C, with zero external dependencies beyond the C standard library. MPS and BLAS acceleration are optional but recommended.

An experiment in AI code generation and open source software

I (the human here, Salvatore) wanted to test code generation with a more ambitious task, over the weekend. This is the result. It is my first open source project where I wrote zero lines of code. I believe that inference systems not using the Python stack (which I do not appreciate) are a way to free open models usage and make AI more accessible. There is already a project doing the inference of diffusion models in C / C++ that supports multiple models, and is based on GGML. I wanted to see if, with the assistance of modern AI, I could reproduce this work in a more concise way, from scratch, in a weekend. Looks like it is possible.

This code base was written with Claude Code, using the Claude Max plan, the small one of ~80 euros per month. I almost reached the limits but this plan was definitely sufficient for such a large task, which was surprising. In order to simplify the usage of this software, no quantization is used, nor do you need to convert the model. It runs directly with the safetensors model as input, using floats.

Even if the code was generated using AI, my help in steering towards the right design, implementation choices, and correctness has been vital during the development. I learned quite a few things about working with non trivial projects and AI.

# Build (choose your backend)
make mps       # Apple Silicon (fastest)
# or: make blas    # Intel Mac / Linux with OpenBLAS
# or: make generic # Pure C, no dependencies

# Download the model (~16GB)
pip install huggingface_hub
python download_model.py

# Generate an image
./flux -d flux-klein-model -p "A woman wearing sunglasses" -o output.png

That's it. No Python runtime, no PyTorch, no CUDA toolkit required at inference time.

Generated with: ./flux -d flux-klein-model -p "A picture of a woman in 1960 America. Sunglasses. ASA 400 film. Black and White." -W 250 -H 250 -o /tmp/woman.png, and later processed with image to image generation via ./flux -d flux-klein-model -i /tmp/woman.png -o /tmp/woman2.png -p "oil painting of woman with sunglasses" -v -H 256 -W 256

Zero dependencies: Pure C implementation, works standalone. BLAS optional for ~30x speedup (Apple Accelerate on macOS, OpenBLAS on Linux)
Metal GPU acceleration: Automatic on Apple Silicon Macs
Text-to-image: Generate images from text prompts
Image-to-image: Transform existing images guided by prompts
Integrated text encoder: Qwen3-4B encoder built-in, no external embedding computation needed
Memory efficient: Automatic encoder release after encoding (~8GB freed)

./flux -d flux-klein-model -p "A fluffy orange cat sitting on a windowsill" -o cat.png

Transform an existing image based on a prompt:

./flux -d flux-klein-model -p "oil painting style" -i photo.png -o painting.png -t 0.7

The -t (strength) parameter controls how much the image changes:

0.0 = no change (output equals input)
1.0 = full generation (input only provides composition hint)
0.7 = good balance for style transfer

Required:

-d, --dir PATH        Path to model directory
-p, --prompt TEXT     Text prompt for generation
-o, --output PATH     Output image path (.png or .ppm)

Generation options:

-W, --width N         Output width in pixels (default: 256)
-H, --height N        Output height in pixels (default: 256)
-s, --steps N         Sampling steps (default: 4)
-S, --seed N          Random seed for reproducibility

Image-to-image options:

-i, --input PATH      Input image for img2img
-t, --strength N      How much to change the image, 0.0-1.0 (default: 0.75)

Output options:

-q, --quiet           Silent mode, no output
-v, --verbose         Show detailed config and timing info

Other options:

-e, --embeddings PATH Load pre-computed text embeddings (advanced)
-h, --help            Show help

The seed is always printed to stderr, even when random:

$ ./flux -d flux-klein-model -p "a landscape" -o out.png
Seed: 1705612345
out.png

To reproduce the same image, use the printed seed:

$ ./flux -d flux-klein-model -p "a landscape" -o out.png -S 1705612345

Choose a backend when building:

make            # Show available backends
make generic    # Pure C, no dependencies (slow)
make blas       # BLAS acceleration (~30x faster)
make mps        # Apple Silicon Metal GPU (fastest, macOS only)

Recommended:

macOS Apple Silicon: make mps
macOS Intel: make blas
Linux with OpenBLAS: make blas
Linux without OpenBLAS: make generic

For make blas on Linux, install OpenBLAS first:

# Ubuntu/Debian
sudo apt install libopenblas-dev

# Fedora
sudo dnf install openblas-devel

Other targets:

make clean      # Clean build artifacts
make info       # Show available backends for this platform
make test       # Run reference image test

The model weights are downloaded from HuggingFace:

pip install huggingface_hub
python download_model.py

This downloads approximately 16GB to ./flux-klein-model:

VAE (~300MB)
Transformer (~4GB)
Qwen3-4B Text Encoder (~8GB)
Tokenizer

FLUX.2-klein-4B is a rectified flow transformer optimized for fast inference:

Component	Architecture
Transformer	5 double blocks + 20 single blocks, 3072 hidden dim, 24 attention heads
VAE	AutoencoderKL, 128 latent channels, 8x spatial compression
Text Encoder	Qwen3-4B, 36 layers, 2560 hidden dim

Inference steps: This is a distilled model that produces good results with exactly 4 sampling steps.

Phase	Memory
Text encoding	~8GB (encoder weights)
Diffusion	~8GB (transformer ~4GB + VAE ~300MB + activations)
Peak	~16GB (if encoder not released)

The text encoder is automatically released after encoding, reducing peak memory during diffusion. If you generate multiple images with different prompts, the encoder reloads automatically.

Maximum resolution: 1024x1024 pixels. Higher resolutions require prohibitive memory for the attention mechanisms.

Minimum resolution: 64x64 pixels.

Dimensions should be multiples of 16 (the VAE downsampling factor).

The library can be integrated into your own C/C++ projects. Link against libflux.a and include flux.h.

Here's a complete program that generates an image from a text prompt:

#include "flux.h"
#include <stdio.h>

int main(void) {
    /* Load the model. This loads VAE, transformer, and text encoder. */
    flux_ctx *ctx = flux_load_dir("flux-klein-model");
    if (!ctx) {
        fprintf(stderr, "Failed to load model: %s\n", flux_get_error());
        return 1;
    }

    /* Configure generation parameters. Start with defaults and customize. */
    flux_params params = FLUX_PARAMS_DEFAULT;
    params.width = 512;
    params.height = 512;
    params.seed = 42;  /* Use -1 for random seed */

    /* Generate the image. This handles text encoding, diffusion, and VAE decode. */
    flux_image *img = flux_generate(ctx, "A fluffy orange cat in a sunbeam", &params);
    if (!img) {
        fprintf(stderr, "Generation failed: %s\n", flux_get_error());
        flux_free(ctx);
        return 1;
    }

    /* Save to file. Format is determined by extension (.png or .ppm). */
    flux_image_save(img, "cat.png");
    printf("Saved cat.png (%dx%d)\n", img->width, img->height);

    /* Clean up */
    flux_image_free(img);
    flux_free(ctx);
    return 0;
}

Compile with:

gcc -o myapp myapp.c -L. -lflux -lm -framework Accelerate  # macOS
gcc -o myapp myapp.c -L. -lflux -lm -lopenblas              # Linux

Image-to-Image Transformation

Transform an existing image guided by a text prompt. The strength parameter controls how much the image changes:

#include "flux.h"
#include <stdio.h>

int main(void) {
    flux_ctx *ctx = flux_load_dir("flux-klein-model");
    if (!ctx) return 1;

    /* Load the input image */
    flux_image *photo = flux_image_load("photo.png");
    if (!photo) {
        fprintf(stderr, "Failed to load image\n");
        flux_free(ctx);
        return 1;
    }

    /* Set up parameters. Output size defaults to input size. */
    flux_params params = FLUX_PARAMS_DEFAULT;
    params.strength = 0.7;  /* 0.0 = no change, 1.0 = full regeneration */
    params.seed = 123;

    /* Transform the image */
    flux_image *painting = flux_img2img(ctx, "oil painting, impressionist style",
                                         photo, &params);
    flux_image_free(photo);  /* Done with input */

    if (!painting) {
        fprintf(stderr, "Transformation failed: %s\n", flux_get_error());
        flux_free(ctx);
        return 1;
    }

    flux_image_save(painting, "painting.png");
    printf("Saved painting.png\n");

    flux_image_free(painting);
    flux_free(ctx);
    return 0;
}

Strength values:

0.3 - Subtle style transfer, preserves most details
0.5 - Moderate transformation
0.7 - Strong transformation, good for style transfer
0.9 - Almost complete regeneration, keeps only composition

Generating Multiple Images

When generating multiple images with different seeds but the same prompt, you can avoid reloading the text encoder:

flux_ctx *ctx = flux_load_dir("flux-klein-model");
flux_params params = FLUX_PARAMS_DEFAULT;
params.width = 256;
params.height = 256;

/* Generate 5 variations with different seeds */
for (int i = 0; i < 5; i++) {
    flux_set_seed(1000 + i);

    flux_image *img = flux_generate(ctx, "A mountain landscape at sunset", &params);

    char filename[64];
    snprintf(filename, sizeof(filename), "landscape_%d.png", i);
    flux_image_save(img, filename);
    flux_image_free(img);
}

flux_free(ctx);

Note: The text encoder (~8GB) is automatically released after the first generation to save memory. It reloads automatically if you use a different prompt.

All functions that can fail return NULL on error. Use flux_get_error() to get a description:

flux_ctx *ctx = flux_load_dir("nonexistent-model");
if (!ctx) {
    fprintf(stderr, "Error: %s\n", flux_get_error());
    /* Prints something like: "Failed to load VAE - cannot generate images" */
    return 1;
}

Core functions:

flux_ctx *flux_load_dir(const char *model_dir);   /* Load model, returns NULL on error */
void flux_free(flux_ctx *ctx);                     /* Free all resources */

flux_image *flux_generate(flux_ctx *ctx, const char *prompt, const flux_params *params);
flux_image *flux_img2img(flux_ctx *ctx, const char *prompt, const flux_image *input,
                          const flux_params *params);

Image handling:

flux_image *flux_image_load(const char *path);     /* Load PNG or PPM */
int flux_image_save(const flux_image *img, const char *path);  /* 0=success, -1=error */
flux_image *flux_image_resize(const flux_image *img, int new_w, int new_h);
void flux_image_free(flux_image *img);

Utilities:

void flux_set_seed(int64_t seed);                  /* Set RNG seed for reproducibility */
const char *flux_get_error(void);                  /* Get last error message */
void flux_release_text_encoder(flux_ctx *ctx);     /* Manually free ~8GB (optional) */

typedef struct {
    int width;              /* Output width in pixels (default: 256) */
    int height;             /* Output height in pixels (default: 256) */
    int num_steps;          /* Denoising steps, use 4 for klein (default: 4) */
    float guidance_scale;   /* CFG scale, use 1.0 for klein (default: 1.0) */
    int64_t seed;           /* Random seed, -1 for random (default: -1) */
    float strength;         /* img2img only: 0.0-1.0 (default: 0.75) */
} flux_params;

/* Initialize with sensible defaults */
#define FLUX_PARAMS_DEFAULT { 256, 256, 4, 1.0f, -1, 0.75f }

MIT