
Google’s Gemini app becomes more agentic with 24/7 AI assistant features
May 19, 2026Google is taking a major leap forward in AI video creation. The tech giant has announced Gemini Omni, a new multimodal AI model that can create anything from any input, starting with video generation capabilities.
This represents the next step in Google’s multimodal AI strategy. Building on the success of last year’s Nano Banana, which brought Gemini’s intelligence to image generation and editing for millions of users, Gemini Omni combines the AI’s reasoning abilities with creative generation powers. The first model in this family, Gemini Omni Flash, is now rolling out to users globally.
Conversational video editing changes the game
The standout feature of Gemini Omni Flash is its ability to edit videos through natural language conversations. Users can give instructions that build on previous edits while maintaining character consistency and realistic physics throughout the process.
The model allows users to transform their existing videos in ways that would be impossible to film. Examples include:
- Changing materials and textures (turning sculptures into bubbles)
- Adding interactive elements (making mirrors ripple like liquid when touched)
- Modifying environments and lighting dynamically
- Creating complex visual effects synchronized to music
The system maintains continuity across multiple editing rounds, letting users refine their videos through iterative conversations without losing the thread of their original concept.
Real-world knowledge meets creative generation
What sets Gemini Omni apart from other video generation tools is its grounding in real-world knowledge. The AI doesn’t just create visually appealing content – it reasons about physics, history, science, and cultural context to create meaningful and realistic scenes.
The model demonstrates improved understanding of physical forces like gravity, kinetic energy, and fluid dynamics. This allows for more realistic motion and interactions in generated videos. It can also create educational content, such as claymation explainers of complex scientific concepts like protein folding.
For creative projects, Omni can generate comprehensive visual content from brief prompts, such as creating an alphabet video with unusual items for each letter, complete with proper pacing and musical accompaniment.
Multimodal input flexibility
Gemini Omni Flash accepts combinations of text, images, video, and audio as input, creating cohesive outputs from diverse source materials. Users can:
- Apply visual styles from reference images to new video content
- Transfer motion patterns from one video to different characters or objects
- Synchronize visual effects to audio tracks
- Transform sketches and drawings into realistic footage
This flexibility allows creators to start with whatever materials they have – whether that’s a rough sketch, a reference photo, or existing video footage – and transform it into polished content that matches their vision.
Responsible AI and digital watermarking
Google has implemented several safeguards for responsible AI use. All videos created with Gemini Omni include SynthID, Google’s imperceptible digital watermark that allows verification through the Gemini app, Chrome, and Google Search.
The company is taking a cautious approach to certain features. While users can create videos with their own voice using digital avatars, Google is still testing broader audio editing capabilities to ensure responsible deployment.
Availability and rollout plans
Gemini Omni Flash is now available to Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow. YouTube users will also get access at no cost through YouTube Shorts and the YouTube Create App starting this week.
Google plans to extend access to developers and enterprise customers through APIs in the coming weeks. The company also indicated that future models in the Omni family will support additional output formats including images and audio.
This launch positions Google directly against other AI video generation tools like OpenAI’s Sora and represents a significant step toward more accessible and powerful video creation tools for both casual users and professional creators.




