我懂了
Veo

原始链接: https://deepmind.google/technologies/veo/

Veo 是 Google 最新的高级视频生成模型,能够以各种时尚格式创建高分辨率、长格式视频。 它深入解释文本提示,生成与预期细微差别和情绪相匹配的镜头。 通过创新的控件,用户可以编辑现有视频或将图像与文本提示合并,从而产生自定义输出。 未来几周,部分创作者可能会通过 VideoFX 进行访问,而未来的计划包括将 Veo 的功能扩展到 YouTube Shorts 和其他平台。 先进的技术可实现帧之间的无缝过渡并遵循图像风格以获得一致的结果。 这一发展是经过多年的研究,包括集成最先进的 Transformer 架构、双生和潜在扩散变压器。 为了确保负责任的使用,新的安全措施会在发布视频之前过滤掉侵犯隐私和侵犯版权等潜在风险因素。 顶级创作者之间的合作促进了持续改进,并将其应用塑造为多样化的应用。

该用户对人工智能抢占与艺术创作相关的工作表示担忧,特别是那些需要操纵颜色、光线和空间意识的工作。 他们认为自己患有诵读困难症,并指出具有讽刺意味的是,需要书面描述的人工智能可能会限制那些难以进行书面表达的个人的机会。 然而,他们承认人工智能为那些无法胜任传统艺术角色的人打开了大门。 尽管对人工智能目前在给定提示之外撰写论文或理解复杂概念的能力持保留态度,但他们设想未来的界面允许用户继续操纵视觉元素,同时将写作过程留给人工智能。 他们反思了人工智能对北极光的渲染与实际观察的显着差异,讨论了这可能对测试人工智能的真实性和感知理解的影响。 最终,他们对人工智能在扩展我们的创造力方面的潜力保持乐观,但强调持续改进和道德考虑的重要性。 他们提供了人工智能失败和局限性的例子,包括制作“现实”棋盘或识别自然现象中细微差别的困难,并质疑制作真正原创内容的可行性。 自始至终,他们都强调人类在指导和控制人工智能应用结果方面的作用,承认持续的学习曲线以及掌握这种快速发展的技术所面临的挑战。
相关文章

原文

Veo is our most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles.

It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control — understanding prompts for all kinds of cinematic effects, like time lapses or aerial shots of a landscape.

Our video generation model will help create tools that make video production accessible to everyone. Whether you're a seasoned filmmaker, aspiring creator, or educator looking to share knowledge, Veo unlocks new possibilities for storytelling, education and more.

Over the coming weeks some of these features will be available to select creators through VideoFX, a new experimental tool at labs.google. You can join the waitlist now.

In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.

Greater understanding of language and vision

To produce a coherent scene, generative video models need to accurately interpret a text prompt and combine this information with relevant visual references.

With advanced understanding of natural language and visual semantics, Veo generates video that closely follows the prompt. It accurately captures the nuance and tone in a phrase, rendering intricate details within complex scenes.

Controls for film-making

When given both an input video and editing command, like adding kayaks to an aerial shot of a coastline, Veo can apply this command to the initial video and create a new, edited video.

In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt.

Veo can also generate a video with an image as input along with the text prompt. By providing a reference image in combination with a text prompt, it conditions Veo to generate a video that follows the image’s style and user prompt’s instructions.

The model is also able to make video clips and extend them to 60 seconds and beyond. It can do this either from a single prompt, or by being given a sequence of prompts which together tell a story.

Consistency across video frames

Maintaining visual consistency can be a challenge for video generation models. Characters, objects, or even entire scenes can flicker, jump, or morph unexpectedly between frames, disrupting the viewing experience.

Veo's cutting-edge latent diffusion transformers reduce the appearance of these inconsistencies, keeping characters, objects and styles in place, as they would in real life.

Built upon years of video generation research

Veo builds upon years of generative video model work including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere, and also our Transformer architecture and Gemini.

To help Veo understand and follow prompts more accurately, we have also added more details to the captions of each video in its training data. And to further improve performance, the model uses high-quality, compressed representations of video (also known as latents) so it’s more efficient too. These steps improve overall quality and reduce the time it takes to generate videos.

Responsible by design

It's critical to bring technologies like Veo to the world responsibly. Videos created by Veo are watermarked using SynthID, our cutting-edge tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes that help mitigate privacy, copyright and bias risks.

Veo’s future will be informed by our work with leading creators and filmmakers. Their feedback helps us improve our generative video technologies and makes sure they benefit the wider creative community and beyond.

Note: All videos on this page were generated by Veo and have not been modified.

Acknowledgements

This work was made possible by the exceptional contributions of: Abhishek Sharma, Adams Yu, Ali Razavi, Andeep Toor, Andrew Pierson, Ankush Gupta, Austin Waters, Daniel Tanis, Dumitru Erhan, Eric Lau, Eleni Shaw, Gabe Barth-Maron, Greg Shaw, Han Zhang, Henna Nandwani, Hernan Moraldo, Hyunjik Kim, Irina Blok, Jakob Bauer, Jeff Donahue, Junyoung Chung, Kory Mathewson, Kurtis David, Lasse Espeholt, Marc van Zee, Matt McGill, Medhini Narasimhan, Miaosen Wang, Mikołaj Bińkowski, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Nick Pezzotti, Pieter-Jan Kindermans, Poorva Rane, Rachel Hornung, Robert Riachi, Ruben Villegas, Rui Qian, Sander Dieleman, Serena Zhang, Serkan Cabi, Shixin Luo, Shlomi Fruchter, Signe Nørly, Srivatsan Srinivasan, Tobias Pfaff, Tom Hume, Vikas Verma, Weizhe Hua, William Zhu, Xinchen Yan, Xinyu Wang, Yelin Kim, Yuqing Du and Yutian Chen.

We extend our gratitude to Aida Nematzadeh, Alex Cullum, April Lehman, Aäron van den Oord, Benigno Uria, Charlie Chen, Charlie Nash, Charline Le Lan, Conor Durkan, Cristian Țăpuș, David Bridson, David Ding, David Steiner, Emanuel Taropa, Evgeny Gladchenko, Frankie Garcia, Gavin Buttimore, Geng Yan, Greg Shaw, Hadi Hashemi, Harsha Vashisht, Hartwig Adam, Huisheng Wang, Jacob Austin, Jacob Kelly, Jacob Walker, Jim Lin, Jonas Adler, Joost van Amersfoort, Jordi Pont-Tuset, Josh Newlan, Josh V. Dillon, Junwhan Ahn, Kelvin Xu, Kristian Kjems, Lois Zhou, Luis C. Cobo, Maigo Le, Malcolm Reynolds, Marcus Wainwright, Mary Cassin, Mateusz Malinowski, Matt Smart, Matt Young, Mingda Zhang, Minh Giang, Moritz Dickfeld, Nancy Xiao, Nelly Papalampidi, Nir Shabat, Oliver Woodman, Ollie Purkiss, Oskar Bunyan, Patrice Oehen, Pauline Luc, Pete Aykroyd, Petko Georgiev, Phil Chen, RJ Mical, Rakesh Shivanna, Ramya Ganeshan, Richard Nguyen, Robin Strudel, Rohan Anil, Sam Haves, Shanshan Zheng, Sholto Douglas, Siddhartha Brahma, Tatiana López, Tobias Pfaff, Victor Gomes, Vighnesh Birodkar, Xin Chen, Yaroslav Ganin, Yi-Ling Wang, Yilin Ma, Yori Zwols, Yu Qiao, Yuchen Liang, Yusuf Aytar and Zu Kim for their invaluable partnership in developing and refining key components of this project.

Special thanks to Douglas Eck, Nando de Freitas, Oriol Vinyals, Eli Collins, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.

We also acknowledge the many other individuals who contributed across Google DeepMind and our partners at Google.

联系我们 contact @ memedata.com