Gemini Omni

原始链接: https://deepmind.google/models/gemini-omni/

相关文章

原文

Gemini Omni is where Gemini’s ability to reason meets the ability to create. It delivers a leap in world understanding, multimodality, and editing.


Edit through natural conversation

Think of Gemini Omni like Nano Banana – but for video. Build and fine-tune your creation at any step with natural language.

Reimagine the action

Switch up what happens in your videos, from the ordinary to the spectacular.

Prompt: Make it look like the weird shape of my hand hole super zooms and magnifies the ground it's looking at in sharper quality.

Prompt: When the finger in

Prompt: The lights of the apartments start turning on in sync with the music.

Edit over multiple turns, with consistency

Craft your scene step-by-step, changing specific details, environments, camera angles, and more.

Input video

Prompt: Transport the violinist to the image environment

Prompt: Make the violin invisible

Prompt: Change the camera angle to be over the violinist’s shoulder.

Swap in different objects or characters with natural language

Replace characters and objects in your video just by asking, all while maintaining a coherent, cohesive scene.

Prompt: Change spaceship to


Bring ideas to life, grounded in Gemini’s world knowledge

Create scenes that follow real-world logic. Gemini Omni pulls from its deep knowledge of history, biology, and narrative logic to construct compelling stories.

Create output that follows real-world physics

Omni has an intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics for more realistic movement.

Prompt: A marble rolling fast on a chain reaction style track, continuous smooth shot

Draw on real-world history, science, and math

Omni understands world history, science, and math – and knows how to craft stories around it.

Prompt: claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate

Prompt: A skeuomorphism stop motion explainer about how the brain hippocampus works with a compelling voiceover. Don’t add seahorses. No voice cuts at the end. Don’t add text.

Sync text with onscreen action

Go beyond just rendering realistic text. Create videos that coherently connect text to what’s happening in the video.

Prompt: The video shows items of the alphabet. An unusual item starting with each letter is shown sitting on a table (like a Capybara for C, disco globe for D and Lava Lamp for L). All 26 letters must be represented by 26 items with matching lower thirds displaying the letter. Only one item and lower third at a time. Each lower third must look like a black marker written on a slip of paper in the bottom left. Rapid fire, roughly 9 frames per item at 24FPS. Last frame is a slip of paper "THE END". The whole video is accompanied by calm smooth music.

Prompt: word by word, one word on a the screen at a time: did, you, know, that, this, model, can, do, pretty, good, text!? each word appears with a different animated style, perfect pacing to a rhythm, sizzle reel


Reference anything

Reference and combine ingredients to maintain control and consistency over your scene.

Creating your prompts

Use our prompt guide to create realistic, coherent, and creative output.

Safety

From development to deployment

Gemini Omni Flash was developed in partnership with internal safety, security, and responsibility teams. A range of evaluations and red teaming activities were conducted to help improve the model and inform decision-making. These evaluations and activities align with Google's AI Principles and responsible AI approach, as well as Google's Generative AI policies (e.g. Gen AI Prohibited Use Policy and the Gemini API Additional Terms of Service). Evaluation types included but were not limited to:

Training/development evaluations including automated and human evaluations carried out continuously throughout and after the model’s training, to monitor its progress and performance

Human red teaming conducted by specialist teams who sit outside of the model development team, across the policies and desiderata, deliberately trying to spot weaknesses and ensure the model adheres to safety policies and desired outcomes

Automated red teaming to dynamically evaluate Gemini Omni Flash for safety and security considerations at scale, complementing human red teaming and static evaluations

Ethics and safety reviews conducted ahead of the model’s release

Content created or edited with Omni in the Gemini app, Google Flow or YouTube includes our imperceptible SynthID digital watermark and C2PA Content Credentials. You can easily verify content through the Gemini app and coming soon to Chrome and Search. You can find out more about how we're expanding our content transparency and verification tools to help you understand how content was created and edited across the web in our blog post.


Try Gemini Omni


Google AI subscription required. Features vary by tier and geography.

联系我们 contact @ memedata.com