Google Releases Gemini 3.1 Flash Live: Real-Time Multimodal Voice Model for AI Agents

27-03-2026

Google launches Gemini 3.1 Flash Live, a real-time multimodal voice model for AI agents with low-latency audio, video and tool use. Available now in preview.

Written by:

Jorick van Weelie

Marketing Lead at DataNorth | AI Enthusiast & Tech Storyteller

March 27, 2026

Google has released Gemini 3.1 Flash Live, a real-time multimodal voice model built for low-latency audio, video and tool use in AI agent workflows. The model is now available in preview through the Gemini Live API in Google AI Studio. Google describes it as its highest-quality audio and voice model to date, and is using it to power upgrades to both Gemini Live and Search Live across more than 200 countries.

What Gemini 3.1 Flash Live does

Gemini 3.1 Flash Live is a voice-first model designed for real-time, multimodal conversations. It processes audio, video and text inputs simultaneously, enabling interactive exchanges with minimal delay. The model supports over 90 languages and is optimized for scenarios where responsiveness matters, such as voice assistants, customer service agents and real-time search interactions.

On the consumer side, the model now powers Gemini Live on Android and iOS. Google says conversations feel noticeably faster, with fewer pauses between turns. The model also dynamically adjusts response length and tone based on conversational context, which should make extended brainstorming sessions more natural.

Key capabilities and improvements over previous versions

Compared to its predecessor, Gemini 2.5 Flash Native Audio, the new model offers lower latency and better audio quality. It is more effective at recognizing acoustic nuances like pitch and pace, and significantly better at filtering out background noise from sources such as traffic or television. This makes it more practical for use in noisy, real-world environments rather than just controlled settings.

Conversation memory has also improved. Gemini Live can now follow the thread of a conversation for roughly twice as long as before, keeping context intact during longer sessions. For developers, the model shows improved instruction-following, better adherence to complex system prompts and a stronger ability to trigger external tools mid-conversation. These are important capabilities for anyone building agentic applications on top of the API.

Search live goes global

One of the more notable product implications of this release is the global expansion of Search Live. Powered by Gemini 3.1 Flash Live, Search Live is rolling out to over 200 countries and now incorporates both audio and video (via Google Lens) for interactive searches. Users can point their camera at an object and have a spoken conversation about it, or ask follow-up questions in natural language while browsing search results.

This positions Google to compete more directly with voice-first search experiences from other providers. The combination of low-latency voice interaction, visual input and web search creates a notably different experience from typing queries into a search bar.

Availability and developer access

Gemini 3.1 Flash Live is available in preview through the Gemini Live API in Google AI Studio. Enterprise customers can access it for customer experience applications. Google has not yet published pricing details for API access, though the Flash tier has historically been positioned as the cost-efficient option in Google’s model lineup. The Gemini 3.1 Flash-Lite variant, which launched earlier this month, is priced at $0.25 per million input tokens for comparison.

For full technical details, see the official model card on Google DeepMind and the Google blog announcement.