苹果发布开源模型,可将2D照片即时转换为3D视图。
Apple releases open-source model that instantly turns 2D photos into 3D views

原始链接: https://github.com/apple/ml-sharp

## SHARP:快速、照片级逼真的视图合成 SHARP是一种新方法,能够从*单个*输入图像创建照片级逼真的场景视图,并在不到一秒钟内完成。它通过使用神经网络快速回归场景的3D高斯表示来实现。这种表示允许从新视点实时渲染高分辨率图像,并具有准确的度量比例,以实现逼真的相机运动。 SHARP的性能显著优于以前的方法,降低了图像差异指标(LPIPS 和 DISTS)21-43%,并将合成速度提高了三个数量级。 该软件提供了一个命令行界面,用于从图像预测3D高斯斑点(.ply 文件),并可以选择从指定的相机轨迹渲染视频(渲染需要CUDA GPU)。安装使用`conda`和`pip`非常简单。预训练模型会自动下载,或可以手动指定。 更多详细信息、定量结果和定性示例可在附带的研究论文和在线演示中找到。如果您使用这项工作,请务必引用该论文。

相关文章

原文

Project Page arXiv

This software project accompanies the research paper: Sharp Monocular View Synthesis in Less Than a Second by Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun.

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25–34% and DISTS by 21–43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.

We recommend to first create a python environment:

conda create -n sharp python=3.13

Afterwards, you can install the project using

pip install -r requirements.txt

To test the installation, run

To run prediction:

sharp predict -i /path/to/input/images -o /path/to/output/gaussians

The model checkpoint will be downloaded automatically on first run and cached locally at ~/.cache/torch/hub/checkpoints/.

Alternatively, you can download the model directly:

wget https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt

To use a manually downloaded checkpoint, specify it with the -c flag:

sharp predict -i /path/to/input/images -o /path/to/output/gaussians -c sharp_2572gikvuh.pt

The results will be 3D gaussian splats (3DGS) in the output folder. The 3DGS .ply files are compatible to various public 3DGS renderers. We follow the OpenCV coordinate convention (x right, y down, z forward). The 3DGS scene center is roughly at (0, 0, +z). When dealing with 3rdparty renderers, please scale and rotate to re-center the scene accordingly.

Rendering trajectories (CUDA GPU only)

Additionally you can render videos with a camera trajectory. While the gaussians prediction works for all CPU, CUDA, and MPS, rendering videos via the --render option currently requires a CUDA GPU. The gsplat renderer takes a while to initialize at the first launch.

sharp predict -i /path/to/input/images -o /path/to/output/gaussians --render

# Or from the intermediate gaussians:
sharp render -i /path/to/output/gaussians -o /path/to/output/renderings

Please refer to the paper for both quantitative and qualitative evaluations. Additionally, please check out this qualitative examples page containing several video comparisons against related work.

If you find our work useful, please cite the following paper:

@inproceedings{Sharp2025:arxiv,
  title      = {Sharp Monocular View Synthesis in Less Than a Second},
  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
  journal    = {arXiv preprint arXiv:2512.10685},
  year       = {2025},
  url        = {https://arxiv.org/abs/2512.10685},
}

Our codebase is built using multiple opensource contributions, please see ACKNOWLEDGEMENTS for more details.

Please check out the repository LICENSE before using the provided code and LICENSE_MODEL for the released models.

联系我们 contact @ memedata.com