Google introduces Gemini Omni Flash, an AI model that creates video from any input

May 19, 2026

Close-up of a sunflower center with bright yellow petals and the white text 'Gemini Omni' across the middle.

Google is taking a major leap forward in AI video creation. The tech giant has announced Gemini Omni, a new multimodal AI model that can create anything from any input, starting with video generation capabilities.

This represents the next step in Google’s multimodal AI strategy. Building on the success of last year’s Nano Banana, which brought Gemini’s intelligence to image generation and editing for millions of users, Gemini Omni combines the AI’s reasoning abilities with creative generation powers. The first model in this family, Gemini Omni Flash, is now rolling out to users globally.

Conversational video editing changes the game

The standout feature of Gemini Omni Flash is its ability to edit videos through natural language conversations. Users can give instructions that build on previous edits while maintaining character consistency and realistic physics throughout the process.

The model allows users to transform their existing videos in ways that would be impossible to film. Examples include:

Changing materials and textures (turning sculptures into bubbles)
Adding interactive elements (making mirrors ripple like liquid when touched)
Modifying environments and lighting dynamically
Creating complex visual effects synchronized to music

The system maintains continuity across multiple editing rounds, letting users refine their videos through iterative conversations without losing the thread of their original concept.

Real-world knowledge meets creative generation

What sets Gemini Omni apart from other video generation tools is its grounding in real-world knowledge. The AI doesn’t just create visually appealing content – it reasons about physics, history, science, and cultural context to create meaningful and realistic scenes.

The model demonstrates improved understanding of physical forces like gravity, kinetic energy, and fluid dynamics. This allows for more realistic motion and interactions in generated videos. It can also create educational content, such as claymation explainers of complex scientific concepts like protein folding.

For creative projects, Omni can generate comprehensive visual content from brief prompts, such as creating an alphabet video with unusual items for each letter, complete with proper pacing and musical accompaniment.

Multimodal input flexibility

Gemini Omni Flash accepts combinations of text, images, video, and audio as input, creating cohesive outputs from diverse source materials. Users can:

Apply visual styles from reference images to new video content
Transfer motion patterns from one video to different characters or objects
Synchronize visual effects to audio tracks
Transform sketches and drawings into realistic footage

This flexibility allows creators to start with whatever materials they have – whether that’s a rough sketch, a reference photo, or existing video footage – and transform it into polished content that matches their vision.

Responsible AI and digital watermarking

Google has implemented several safeguards for responsible AI use. All videos created with Gemini Omni include SynthID, Google’s imperceptible digital watermark that allows verification through the Gemini app, Chrome, and Google Search.

The company is taking a cautious approach to certain features. While users can create videos with their own voice using digital avatars, Google is still testing broader audio editing capabilities to ensure responsible deployment.

Availability and rollout plans

Gemini Omni Flash is now available to Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow. YouTube users will also get access at no cost through YouTube Shorts and the YouTube Create App starting this week.

Google plans to extend access to developers and enterprise customers through APIs in the coming weeks. The company also indicated that future models in the Omni family will support additional output formats including images and audio.

This launch positions Google directly against other AI video generation tools like OpenAI’s Sora and represents a significant step toward more accessible and powerful video creation tools for both casual users and professional creators.

Conversational video editing changes the game

Real-world knowledge meets creative generation

Multimodal input flexibility

Responsible AI and digital watermarking

Availability and rollout plans

Related news

Norway bans AI tools in elementary schools starting this fall

OpenAI recruits Google DeepMind’s Noam Shazeer and a Trump White House AI official ahead of its IPO

Amazon wants to sell its AI chips to outside companies, taking aim at Nvidia