利用生成式AI将3D可购物产品带到线上
Bringing 3D shoppable products online with generative AI

原始链接: https://research.google/blog/bringing-3d-shoppable-products-online-with-generative-ai/

我们通过微调谷歌的Veo模型,实现了先进的视频生成技术,能够仅从单张或多张图像生成360°的产品视频。利用Veo在捕捉复杂光线、材质和几何体交互方面的优势,我们生成了一个大型的合成3D资产数据集,这些资产是从不同的角度和光线下渲染出来的。这个包含图像和视频配对的数据集被用来训练Veo,使其能够根据输入图像生成一致的360°旋转视频。最终结果是一个能够有效泛化到家具、服装和电子产品等各种产品类别的系统。值得注意的是,Veo克服了以往方法的局限性,能够准确地渲染复杂的光照效果和材质特性,例如光亮表面的反射,从而确保生成逼真且高质量的360°产品可视化效果。

这篇 Hacker News 帖子讨论了 Google 利用生成式 AI 从 2D 图片创建可购物 3D 产品模型的研究。一些评论者对这些 AI 生成模型的准确性和潜在的“幻觉”表示担忧,质疑其对零售商的可靠性以及虚假广告的可能性。另一些人则强调了这项技术对缺乏专业摄影或 CAD 模型资源的小型卖家带来的潜在好处。这项技术被认为是 3D 购物的重大进步,解决了逼真的灯光、纹理以及融入更大场景等挑战。然而,一些用户认为重点放错了地方,认为在线商店搜索特定服装等问题仍然没有解决。他们还注意到该技术缺乏现成的试用版本,一位评论者建议制造商共享通用的 3D 模型格式,这可以消除对 AI 生成的需求。

原文

Our latest breakthrough builds on Veo, Google's state-of-the-art video generation. A key strength of Veo is its ability to generate videos that capture complex interactions between light, material, texture, and geometry. Its powerful diffusion-based architecture and its ability to be finetuned on a variety of multi-modal tasks enable it to excel at novel view synthesis.

To finetune Veo to transform product images into a consistent 360° video, we first curated a dataset of millions of high quality, 3D synthetic assets. We then rendered the 3D assets from various camera angles and lighting conditions. Finally, we created a dataset of paired images and videos and supervised Veo to generate 360° spins conditioned on one or more images.

We discovered that this approach generalized effectively across a diverse set of product categories, including furniture, apparel, electronics and more. Veo was not only able to generate novel views that adhered to the available product images, but it was also able to capture complex lighting and material interactions (i.e., shiny surfaces), something which was challenging for the first- and second-generation approaches.

联系我们 contact @ memedata.com