DiffusionBench：生成式扩散 Transformer 的全方位评估

DiffusionBench：生成式扩散 Transformer 的全方位评估
DiffusionBench: Towards Holistic Evaluation of Generative Diffusion Transformers

原始链接: https://github.com/End2End-Diffusion/diffusion-bench

**DiffusionBench** 是一个全面且统一的代码库，旨在通过为扩散 Transformer 提供整体基准测试，突破传统的 ImageNet 评估方式。它通过单一且精简的接口，支持包括 ImageNet（类条件）和文生图（T2I）在内的多种生成任务的训练与评估。该代码库提供了丰富的模块化组件库，包括： * **编码器与潜空间：** 支持超过 30 种 RAE、VAE 和表示编码器（如 DINOv2、SigLIP2）。 * **模型架构与目标函数：** 包含多种输出预测方式、流匹配技术，以及 LightningDiT 和 JiT 等架构。 * **评估指标：** 除了标准的 FID/IS 指标外，还整合了 GenEval、DPGBench 和 VQAScore 等先进的评估基准。 DiffusionBench 采用分阶段训练工作流——先进行分词器训练，随后进行扩散建模——并配备了预配置设置，以实现无缝的复现和实验记录。该项目强调可扩展性与可复现性，欢迎社区贡献新的评估维度、指标及模型检查点，共同为生成式 AI 研究建立更稳健的标准。

抱歉。

原文

##############################################################################
#                                                                            #
#   ____  _  __  __           _                            .-----------.     #
#  |  _ \(_)/ _|/ _|_   _ ___(_) ___  _ __                 |           |     #
#  | | | | | |_| |_| | | / __| |/ _ \| '_ \                | ░▒▓█▓▒░▒▓ |     #
#  | |_| | |  _|  _| |_| \__ \ | (_) | | | |               | ▒▓█████▓▒ |     #
#  |____/|_|_| |_|  \__,_|___/_|\___/|_| |_|               | ▓███████▓ |     #
#                                                          |     ↓     |     #
#   ____                  _                                | █████████ |     #
#  | __ )  ___ _ __   ___| |__                             | ▓███████▓ |     #
#  |  _ \ / _ \ '_ \ / __| '_ \                            | ▒▓█████▓▒ |     #
#  | |_) |  __/ | | | (__| | | |                           |           |     #
#  |____/ \___|_| |_|\___|_| |_|                           '-----------'     #
#                                                                            #
#           Because ImageNet evaluation alone is no longer enough!           #
#                                                                            #
##############################################################################

📣 Announcement post: Call for DiffusionBench: A Holistic Benchmark for Diffusion Transformers. Help us grow the benchmark with new evaluation axes, new metrics, and faithful reproductions of published methods.

This repo contains the unified codebase for DiffusionBench. It supports training and evaluation across different generation tasks (ImageNet, T2I, ...) through a single interface. Please see the sections below for the detailed structure. Come join us!

_{Text-to-image samples at 256×256 from models trained for 200K iterations using DiffusionBench.}

# install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# install dependencies
uv sync

# prepare data
uv run python scripts/prepare.py --data {all,imagenet,t2i,eval}

# download pretrained models
uv run hf download diffusion-bench/diffusion-bench --local-dir pretrained_models --exclude .gitattributes

Reproduction flow: Stage 1 → Stage 2. Set these environment variables first (used for the output directory and W&B logging):

export EXPERIMENT_NAME=<run-name>
export ENTITY=<wandb-entity>
export PROJECT=<wandb-project>
export WANDB_KEY=<key>

Stage 1. Train the RAE tokenizer:

uv run torchrun --standalone --nproc_per_node=8 \
    src/train_stage1.py \
    --config [STAGE1_CONFIG_PATH] \
    --results-dir results/stage1 --precision bf16 --compile --wandb

Stage 2. Train the diffusion model on VAE/RAE/Pixel space:

uv run torchrun --standalone --nproc_per_node=8 \
    src/train.py \
    --config [STAGE2_CONFIG_PATH] \
    --results-dir results/stage2 --precision bf16 --compile --wandb

Stage 2 training configs run online evaluation during training (the eval: block). For standalone evaluation of a released checkpoint, use the sampling/ configs — each embeds stage_2.ckpt (pointing into pretrained_models/) and the eval-time guidance, so the weights load automatically:

export EXPERIMENT_NAME=<run-name>

# stage 1 reconstruction (rFID/PSNR/SSIM/LPIPS)
uv run torchrun --nproc_per_node=8 src/offline_eval_stage1.py --config [STAGE1_CONFIG_PATH]

# stage 2 generation (FID/IS, GenEval/DPGBench/...)
uv run torchrun --nproc_per_node=8 src/offline_eval.py --config [STAGE2_CONFIG_PATH]

configs/
├── stage1/
└── stage2/
    ├── training/
    │   ├── imagenet/
    │   └── t2i/
    └── sampling/
        ├── imagenet/
        └── t2i/

Stage 2 spans VAE (11), RAE (6), REG (4), and Pixel (3) families, identical across ImageNet and T2I. Swap any config between tasks with a single path change. The sampling/ set mirrors training/ but adds the trained checkpoint and eval-time guidance, so it runs offline eval directly.

For ImageNet, pick the CFG-off baseline ([STAGE2_CONFIG_PATH].yaml) or the per-model best-CFG variant ([STAGE2_CONFIG_PATH]-cfg<scale>-t0.0-0.9.yaml).

Category	Methods
Latent Space	`Pixel Space` `RAE` (30+ representation encoders): `DINOv2` `SigLIP2` `WebSSL` `PE` `LangPE` and more `RAEv2` (30+ representation encoders): `DINOv2` `SigLIP2` `WebSSL` `PE` `LangPE` etc `VAE` (10+ VAEs): `FLUX.2` `FLUX.1` `SD3.5` `VA-VAE` `E2E-VAE` and more
Output Prediction	`x-prediction` `v-prediction`
Transport	`Rectified-Flow` `MeanFlow` `Improved-MeanFlow` `Pixel-MeanFlow` `Drifting`
Loss	`Flow Matching` `REPA` `iREPA`
Architecture	`LightningDiT` `JiT` `DDT`
Tasks	`ImageNet`: class-conditional generation `T2I`: text-to-image generation
Evaluation	ImageNet: `FID` `IS` T2I: `GenEval` `DPGBench` `GenAIBench` `VQAScore`
Training Backend	`DDP` `FSDP [TODO]`

	Status	Details
Coding Agents	Yes	Agent-compatible. See `skills/` for setup and workflow skills.
AutoResearch	[TODO]	AutoResearch integration is planned (not yet available).

We welcome contributions! Please refer to docs/contributors.md and docs/contributing.md for further details.

The codebase is built upon some amazing projects:

We thank the authors for making their work publicly available.

DiffusionBench：生成式扩散 Transformer 的全方位评估 DiffusionBench: Towards Holistic Evaluation of Generative Diffusion Transformers

DiffusionBench：生成式扩散 Transformer 的全方位评估
DiffusionBench: Towards Holistic Evaluation of Generative Diffusion Transformers