展示 HN:TRELLIS.2 图像到 3D 在 Mac Silicon 上运行 – 不需要 Nvidia GPU。
Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

原始链接: https://github.com/shivampkumar/trellis-mac

## TRELLIS.2 在 Mac 上:无需 NVIDIA 的图像到 3D 生成 该项目将微软最先进的 TRELLIS.2 图像到 3D 模型带到 Apple Silicon Mac(M1 或更新型号)上,*无需* NVIDIA GPU。它是一个利用 PyTorch MPS 的移植版本,能够在 M4 Pro 上大约 3.5 分钟内从单张图像生成详细的(40万+顶点)3D 网格。 该移植版本用 PyTorch/Python 替代了依赖 CUDA 的稀疏卷积、网格提取和注意力机制库。虽然速度较慢(~10 倍)于原始 CUDA 实现,但它提供了可用的 3D 生成功能。 **主要特点:** 输出带有顶点颜色的 OBJ 和 GLB 文件。需要 Python 3.11+,24GB+ 统一内存,以及 ~15GB 磁盘空间。请注意,由于 CUDA 依赖性,纹理导出和孔洞填充目前已禁用。 **开始使用:** 克隆仓库,登录 HuggingFace 以访问 gated 模型权重,运行 `setup.sh` 脚本,然后使用 `python generate.py path/to/image.png`。

开发者shivampkumar已成功将微软的TRELLIS.2(一个40亿参数的图像到3D模型)移植到苹果芯片Mac上运行,关键在于*无需*英伟达GPU。 原始模型依赖CUDA和特定的英伟达技术。此次移植用纯PyTorch替代方案,包括自定义的稀疏3D卷积和注意力机制,仅需修改几百行代码。 结果是能够在M4 Pro芯片上从单张图像生成3D网格(约40万顶点),耗时约3.5分钟。虽然比在H100 GPU上运行慢,但它提供了离线功能和消除云依赖性的显著优势。代码可在GitHub上找到:[https://github.com/shivampkumar/trellis-mac](https://github.com/shivampkumar/trellis-mac)。
相关文章

原文

Run TRELLIS.2 image-to-3D generation natively on Mac.

This is a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required.

Generates 400K+ vertex meshes from single images in ~3.5 minutes on M4 Pro.

Output includes vertex-colored OBJ and GLB files ready for use in 3D applications.

  • macOS on Apple Silicon (M1 or later)
  • Python 3.11+
  • 24GB+ unified memory recommended (the 4B model is large)
  • ~15GB disk space for model weights (downloaded on first run)
# Clone this repo
git clone https://github.com/shivampkumar/trellis-mac.git
cd trellis-mac

# Log into HuggingFace (needed for gated model weights)
hf auth login

# Request access to these gated models (usually instant approval):
#   https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
#   https://huggingface.co/briaai/RMBG-2.0

# Run setup (creates venv, installs deps, clones & patches TRELLIS.2)
bash setup.sh

# Activate the environment
source .venv/bin/activate

# Generate a 3D model from an image
python generate.py path/to/image.png

Output files are saved to the current directory (or use --output to specify a path).

# Basic usage
python generate.py photo.png

# With options
python generate.py photo.png --seed 123 --output my_model --pipeline-type 512

# All options
python generate.py --help
Option Default Description
--seed 42 Random seed for generation
--output output_3d Output filename (without extension)
--pipeline-type 512 Pipeline resolution: 512, 1024, 1024_cascade

TRELLIS.2 depends on several CUDA-only libraries. This port replaces them with pure-PyTorch and pure-Python alternatives:

Original (CUDA) Replacement Purpose
flex_gemm backends/conv_none.py Sparse 3D convolution via gather-scatter
o_voxel._C hashmap backends/mesh_extract.py Mesh extraction from dual voxel grid
flash_attn PyTorch SDPA Scaled dot-product attention for sparse transformers
cumesh Stub (graceful skip) Hole filling, mesh simplification
nvdiffrast Stub Differentiable rasterization (texture export)

Additionally, all hardcoded .cuda() calls throughout the codebase were patched to use the active device instead.

Sparse 3D Convolution (backends/conv_none.py): Implements submanifold sparse convolution by building a spatial hash of active voxels, gathering neighbor features for each kernel position, applying weights via matrix multiplication, and scatter-adding results back. Neighbor maps are cached per-tensor to avoid redundant computation.

Mesh Extraction (backends/mesh_extract.py): Reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations. Builds a coordinate-to-index lookup table, finds connected voxels for each edge, and triangulates quads using normal alignment heuristics.

Attention (patched full_attn.py): Adds an SDPA backend to the sparse attention module. Pads variable-length sequences into batches, runs torch.nn.functional.scaled_dot_product_attention, then unpads results.

Benchmarks on M4 Pro (24GB), pipeline type 512:

Stage Time
Model loading ~45s
Image preprocessing ~5s
Sparse structure sampling ~15s
Shape SLat sampling ~90s
Texture SLat sampling ~50s
Mesh decoding ~30s
Total ~3.5 min

Memory usage peaks at around 18GB unified memory during generation.

  • No texture export: Texture baking requires nvdiffrast (CUDA-only differentiable rasterizer). Meshes export with vertex colors only.
  • Hole filling disabled: Mesh hole filling requires cumesh (CUDA). Meshes may have small holes.
  • Slower than CUDA: The pure-PyTorch sparse convolution is ~10x slower than the CUDA flex_gemm kernel. This is the main bottleneck.
  • No training support: Inference only.

The porting code in this repository (backends, patches, scripts) is released under the MIT License.

Upstream model weights are subject to their own licenses:

  • TRELLIS.2 by Microsoft Research — the original model and codebase
  • DINOv3 by Meta — image feature extraction
  • RMBG-2.0 by BRIA AI — background removal
联系我们 contact @ memedata.com