展示 HN：TRELLIS.2 图像到 3D 在 Mac Silicon 上运行

展示 HN：TRELLIS.2 图像到 3D 在 Mac Silicon 上运行 – 不需要 Nvidia GPU。
Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

原始链接: https://github.com/shivampkumar/trellis-mac

## TRELLIS.2 在 Mac 上：无需 NVIDIA 的图像到 3D 生成该项目将微软最先进的 TRELLIS.2 图像到 3D 模型带到 Apple Silicon Mac（M1 或更新型号）上，*无需* NVIDIA GPU。它是一个利用 PyTorch MPS 的移植版本，能够在 M4 Pro 上大约 3.5 分钟内从单张图像生成详细的（40万+顶点）3D 网格。该移植版本用 PyTorch/Python 替代了依赖 CUDA 的稀疏卷积、网格提取和注意力机制库。虽然速度较慢（~10 倍）于原始 CUDA 实现，但它提供了可用的 3D 生成功能。 **主要特点：** 输出带有顶点颜色的 OBJ 和 GLB 文件。需要 Python 3.11+，24GB+ 统一内存，以及 ~15GB 磁盘空间。请注意，由于 CUDA 依赖性，纹理导出和孔洞填充目前已禁用。 **开始使用：** 克隆仓库，登录 HuggingFace 以访问 gated 模型权重，运行 `setup.sh` 脚本，然后使用 `python generate.py path/to/image.png`。

开发者shivampkumar已成功将微软的TRELLIS.2（一个40亿参数的图像到3D模型）移植到苹果芯片Mac上运行，关键在于*无需*英伟达GPU。原始模型依赖CUDA和特定的英伟达技术。此次移植用纯PyTorch替代方案，包括自定义的稀疏3D卷积和注意力机制，仅需修改几百行代码。结果是能够在M4 Pro芯片上从单张图像生成3D网格（约40万顶点），耗时约3.5分钟。虽然比在H100 GPU上运行慢，但它提供了离线功能和消除云依赖性的显著优势。代码可在GitHub上找到：[https://github.com/shivampkumar/trellis-mac](https://github.com/shivampkumar/trellis-mac)。

原文

Run TRELLIS.2 image-to-3D generation natively on Mac.

This is a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required.

Generates 400K+ vertex meshes from single images in ~3.5 minutes on M4 Pro.

Output includes vertex-colored OBJ and GLB files ready for use in 3D applications.

macOS on Apple Silicon (M1 or later)
Python 3.11+
24GB+ unified memory recommended (the 4B model is large)
~15GB disk space for model weights (downloaded on first run)

# Clone this repo
git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac

# Log into HuggingFace (needed for gated model weights)
hf auth login

# Request access to these gated models (usually instant approval):
#   https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
#   https://huggingface.co/briaai/RMBG-2.0

# Run setup (creates venv, installs deps, clones & patches TRELLIS.2)
bash setup.sh

# Activate the environment
source .venv/bin/activate

# Generate a 3D model from an image
python generate.py path/to/image.png

Output files are saved to the current directory (or use --output to specify a path).

# Basic usage
python generate.py photo.png

# With options
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512

# All options
python generate.py --help

Option	Default	Description
`--seed`	42	Random seed for generation
`--output`	`output_3d`	Output filename (without extension)
`--pipeline-type`	`512`	Pipeline resolution: `512`, `1024`, `1024_cascade`

TRELLIS.2 depends on several CUDA-only libraries. This port replaces them with pure-PyTorch and pure-Python alternatives:

Original (CUDA)	Replacement	Purpose
`flex_gemm`	`backends/conv_none.py`	Sparse 3D convolution via gather-scatter
`o_voxel._C` hashmap	`backends/mesh_extract.py`	Mesh extraction from dual voxel grid
`flash_attn`	PyTorch SDPA	Scaled dot-product attention for sparse transformers
`cumesh`	Stub (graceful skip)	Hole filling, mesh simplification
`nvdiffrast`	Stub	Differentiable rasterization (texture export)

Additionally, all hardcoded .cuda() calls throughout the codebase were patched to use the active device instead.

Sparse 3D Convolution (backends/conv_none.py): Implements submanifold sparse convolution by building a spatial hash of active voxels, gathering neighbor features for each kernel position, applying weights via matrix multiplication, and scatter-adding results back. Neighbor maps are cached per-tensor to avoid redundant computation.

Mesh Extraction (backends/mesh_extract.py): Reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations. Builds a coordinate-to-index lookup table, finds connected voxels for each edge, and triangulates quads using normal alignment heuristics.

Attention (patched full_attn.py): Adds an SDPA backend to the sparse attention module. Pads variable-length sequences into batches, runs torch.nn.functional.scaled_dot_product_attention, then unpads results.

Benchmarks on M4 Pro (24GB), pipeline type 512:

Stage	Time
Model loading	~45s
Image preprocessing	~5s
Sparse structure sampling	~15s
Shape SLat sampling	~90s
Texture SLat sampling	~50s
Mesh decoding	~30s
Total	~3.5 min

Memory usage peaks at around 18GB unified memory during generation.

No texture export: Texture baking requires nvdiffrast (CUDA-only differentiable rasterizer). Meshes export with vertex colors only.
Hole filling disabled: Mesh hole filling requires cumesh (CUDA). Meshes may have small holes.
Slower than CUDA: The pure-PyTorch sparse convolution is ~10x slower than the CUDA flex_gemm kernel. This is the main bottleneck.
No training support: Inference only.

The porting code in this repository (backends, patches, scripts) is released under the MIT License.

Upstream model weights are subject to their own licenses:

TRELLIS.2 by Microsoft Research — the original model and codebase
DINOv3 by Meta — image feature extraction
RMBG-2.0 by BRIA AI — background removal

展示 HN：TRELLIS.2 图像到 3D 在 Mac Silicon 上运行 – 不需要 Nvidia GPU。 Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

展示 HN：TRELLIS.2 图像到 3D 在 Mac Silicon 上运行 – 不需要 Nvidia GPU。
Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon