TRELLIS.2：最先进的大型3D生成模型（4B）

TRELLIS.2：最先进的大型3D生成模型（4B）
TRELLIS.2: state-of-the-art large 3D generative model (4B)

原始链接: https://github.com/microsoft/TRELLIS.2

## TRELLIS.2：高保真图像到3D生成 TRELLIS.2 是一种新的40亿参数大型3D生成模型，能够以惊人的速度和效率从2D图像创建高分辨率、完全纹理化的3D资产。它利用了一种新颖的“无场”稀疏体素结构，称为O-Voxel，能够生成复杂的拓扑结构——包括开放表面和内部结构——并完全支持PBR材质（颜色、粗糙度、金属度、不透明度）。该模型在NVIDIA H100 GPU上大约60秒内即可实现高达1536³的分辨率。其主要优势在于纹理网格和O-Voxel之间的快速转换（<10秒CPU到O-Voxel，<100毫秒CUDA到网格）。 TRELLIS.2 建立在专门的软件包之上，例如O-Voxel、FlexGEMM和CuMesh，以实现优化的性能。代码可在GitHub上获取（仅限Linux，需要具有≥24GB内存和CUDA Toolkit 12.4的NVIDIA GPU）。预训练模型可通过Hugging Face访问。网络演示和PBR纹理生成即将推出。该项目采用MIT License发布，部分依赖项具有单独的许可条款。

Hacker News新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 TRELLIS.2：先进的大型3D生成模型 (4B) (github.com/microsoft) 27点由 dvrp 1小时前 | 隐藏 | 过去 | 收藏 | 3评论 https://microsoft.github.io/TRELLIS.2/ NotGMan 7分钟前 | 下一个 [–] 运行需要24GB显卡。回复 ChrisArchitect 1小时前 | 上一个 [–] 项目网站看起来不错：https://microsoft.github.io/TRELLIS.2/ 回复 dang 21分钟前 | 父评论 [–] 感谢，我们也会将该链接放在顶部文字中。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

trellis2.mp4

(Compressed version due to GitHub size limits. See the full-quality video on our project page!)

TRELLIS.2 is a state-of-the-art large 3D generative model (4B parameters) designed for high-fidelity image-to-3D generation. It leverages a novel "field-free" sparse voxel structure termed O-Voxel to reconstruct and generate arbitrary 3D assets with complex topologies, sharp features, and full PBR materials.

1. High Quality, Resolution & Efficiency

Our 4B-parameter model generates high-resolution fully textured assets with exceptional fidelity and efficiency using vanilla DiTs. It utilizes a Sparse 3D VAE with 16× spatial downsampling to encode assets into a compact latent space.

Resolution	Total Time*	Breakdown (Shape + Mat)
512³	~3s	2s + 1s
1024³	~17s	10s + 7s
1536³	~60s	35s + 25s

*Tested on NVIDIA H100 GPU.

2. Arbitrary Topology Handling

The O-Voxel representation breaks the limits of iso-surface fields. It robustly handles complex structures without lossy conversion:

✅ Open Surfaces (e.g., clothing, leaves)
✅ Non-manifold Geometry
✅ Internal Enclosed Structures

Beyond basic colors, TRELLIS.2 models arbitrary surface attributes including Base Color, Roughness, Metallic, and Opacity, enabling photorealistic rendering and transparency support.

Data processing is streamlined for instant conversions that are fully rendering-free and optimization-free.

< 10s (Single CPU): Textured Mesh → O-Voxel
< 100ms (CUDA): O-Voxel → Textured Mesh

System: The code is currently tested only on Linux.
Hardware: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.
Software:
- The CUDA Toolkit is needed to compile certain packages. Recommended version is 12.4.
- Conda is recommended for managing dependencies.
- Python version 3.8 or higher is required.

Clone the repo:

git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive
cd TRELLIS.2

Install the dependencies:

Before running the following command there are somethings to note:
- By adding --new-env, a new conda environment named trellis2 will be created. If you want to use an existing conda environment, please remove this flag.
- By default the trellis2 environment will use pytorch 2.6.0 with CUDA 12.4. If you want to use a different version of CUDA, you can remove the --new-env flag and manually install the required dependencies. Refer to PyTorch for the installation command.
- If you have multiple CUDA Toolkit versions installed, CUDA_HOME should be set to the correct version before running the command. For example, if you have CUDA Toolkit 12.4 and 13.0 installed, you can run export CUDA_HOME=/usr/local/cuda-12.4 before running the command.
- By default, the code uses the flash-attn backend for attention. For GPUs do not support flash-attn (e.g., NVIDIA V100), you can install xformers manually and set the ATTN_BACKEND environment variable to xformers before running the code. See the Minimal Example for more details.
- The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.
- If you encounter any issues during the installation, feel free to open an issue or contact us.
Create a new conda environment named trellis2 and install the dependencies:
```
. ./setup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm
```
The detailed usage of setup.sh can be found by running . ./setup.sh --help.
```
Usage: setup.sh [OPTIONS]
Options:
    -h, --help              Display this help message
    --new-env               Create a new conda environment
    --basic                 Install basic dependencies
    --flash-attn            Install flash-attention
    --cumesh                Install cumesh
    --o-voxel               Install o-voxel
    --flexgemm              Install flexgemm
    --nvdiffrast            Install nvdiffrast
    --nvdiffrec             Install nvdiffrec
```

The pretrained model TRELLIS.2-4B is available on Hugging Face. Please refer to the model card there for more details.

Model	Parameters	Resolution	Link
TRELLIS.2-4B	4 Billion	512³ - 1536³	Hugging Face

1. Image to 3D Generation

Here is an example of how to use the pretrained models for 3D asset generation.

import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"  # Can save GPU memory
import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 1. Setup Environment Map
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 2. Load Pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# 3. Load Image & Run
image = Image.open("assets/example_image/T.png")
mesh = pipeline.run(image)[0]
mesh.simplify(16777216) # nvdiffrast limit

# 4. Render Video
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
imageio.mimsave("sample.mp4", video, fps=15)

# 5. Export to GLB
glb = o_voxel.postprocess.to_glb(
    vertices            =   mesh.vertices,
    faces               =   mesh.faces,
    attr_volume         =   mesh.attrs,
    coords              =   mesh.coords,
    attr_layout         =   mesh.layout,
    voxel_size          =   mesh.voxel_size,
    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target   =   1000000,
    texture_size        =   4096,
    remesh              =   True,
    remesh_band         =   1,
    remesh_project      =   0,
    verbose             =   True
)
glb.export("sample.glb", extension_webp=True)

Upon execution, the script generates the following files:

sample.mp4: A video visualizing the generated 3D asset with PBR materials and environmental lighting.
sample.glb: The extracted PBR-ready 3D asset in GLB format.

Note: The .glb file is exported in OPAQUE mode by default. Although the alpha channel is preserved within the texture map, it is not active initially. To enable transparency, import the asset into your 3D software and manually connect the texture's alpha channel to the material's opacity or alpha input.

app.py provides a simple web demo for image to 3D asset generation. you can run the demo with the following command:

Then, you can access the demo at the address shown in the terminal.

2. PBR Texture Generation

Will be released soon. Please stay tuned!

TRELLIS.2 is built upon several specialized high-performance packages developed by our team:

O-Voxel: Core library handling the logic for converting between textured meshes and the O-Voxel representation, ensuring instant bidirectional transformation.
FlexGEMM: Efficient sparse convolution implementation based on Triton, enabling rapid processing of sparse voxel structures.
CuMesh: CUDA-accelerated mesh utilities used for high-speed post-processing, remeshing, decimation, and UV-unwrapping.

This model and code are released under the MIT License.

Please note that certain dependencies operate under separate license terms:

nvdiffrast: Utilized for rendering generated 3D assets. This package is governed by its own License.
nvdiffrec: Implements the split-sum renderer for PBR materials. This package is governed by its own License.

If you find this model useful for your research, please cite our work:

@article{
    xiang2025trellis2,
    title={Native and Compact Structured Latents for 3D Generation},
    author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
    journal={Tech report},
    year={2025}
}