On the 17th of December Google has officially released Gemini 3 Flash, the latest addition to the Gemini 3 family designed to deliver frontier-class intelligence at a fraction of the cost. Released on December 17, 2025, Gemini 3 Flash replaces the previous 2.5 Flash as the default model in the Gemini app and AI Mode in Search. This new model is engineered to bridge the gap between high-speed efficiency and complex reasoning, rivaling larger flagship models while maintaining the low latency required for real-time applications.
Frontier intelligence at Flash-level speed
Gemini 3 Flash marks a significant shift in AI architecture by providing “Pro-grade” reasoning within a lightweight framework. According to internal benchmarks and third-party analysis, the model is 3x faster than Gemini 2.5 Pro while significantly outperforming it across nearly every category.
A standout feature of this release is the introduction of Thinking Levels. For the first time, users and developers can modulate the model’s reasoning depth. By selecting from four distinct states: Minimal, Low, Medium, or High, users can choose to prioritize instant response times for simple tasks or allow the model “time to think” for complex agentic workflows and coding.
Technical specifications and benchmarks
The model features a robust 1-million token context window, matching the standard capacity of the Gemini 3 Pro. This allows for the seamless processing of massive datasets, including hours of video, entire code repositories, or hundreds of thousands of words in a single prompt.
Key performance metrics for Gemini 3 Flash include:
- GPQA diamond (Scientific knowledge): 90.4%
- MMMU-pro (Multimodal reasoning): 81.2%
- SWE-bench verified (Agentic coding): 78.0% (surpassing Gemini 3 Pro’s 76.2%)
- Context window: 1,048,576 input tokens; 65,536 output tokens.
- Pricing: $0.50 per 1 million input tokens; $3.00 per 1 million output tokens.
Empowering agentic workflows and coding
Google is positioning Gemini 3 Flash as the primary engine for “agentic” AI. Its high score on the SWE-bench Verified benchmark makes it an industry leader for autonomous coding tasks. Developers can leverage the model’s new visual reasoning capabilities to execute code that can “zoom, count, and edit” visual inputs in real-time.
The model’s efficiency is further enhanced by context caching, which can reduce costs by up to 90% for repeated high-volume queries. This makes it particularly attractive for enterprises using Google’s new agentic development platform, Google Antigravity, where rapid iteration and low-cost scaling are essential.
Global availability and integration
Gemini 3 Flash is now rolling out globally as the default engine for the Gemini app and AI Mode in Google Search. For technical users, the model is available in preview via the Gemini API in Google AI Studio, Vertex AI, Gemini CLI, and Android Studio.
The launch also coincides with expanded access to Nano Banana Pro (the Gemini 3 Pro image generation model) in the U.S., allowing users to generate high-fidelity infographics and technical diagrams directly within their search results.
For more information on the recent release, please visit the official Gemini 3 Flash announcement.

