Google Releases Gemini 3.5 Flash

26-05-2026

Written by:

Jorick van Weelie

Marketing Lead at DataNorth | AI Enthusiast & Tech Storyteller

Published: May 20, 2026

Google announced Gemini 3.5 Flash at Google I/O 2026 on May 19, making it generally available the same day across the Gemini app, Google AI Studio, the Gemini API, and AI Mode in Google Search. Gemini 3.5 Flash is the first model in the new Gemini 3.5 family and delivers frontier-level intelligence at four times the output speed of comparable models, priced at $1.50 per million input tokens and $9.00 per million output tokens with a one-million-token context window. The model outperforms Google’s own Gemini 3.1 Pro on coding and agentic benchmarks, marking the first time a Flash-tier model has surpassed a Pro-tier model on these workloads.

What can Gemini 3.5 Flash do?

Gemini 3.5 Flash is positioned as Google’s strongest agentic and coding model to date. It scores 76.2% on Terminal-Bench 2.1 (up from 70.3% for Gemini 3.1 Pro), 83.6% on MCP Atlas for tool-use evaluation, and 1656 Elo on the GDPval-AA benchmark for economically valuable agentic work. On the Finance Agent v2 benchmark, Gemini 3.5 Flash reaches 57.9% compared to 43.0% for Gemini 3.1 Pro, a gap of nearly 15 percentage points.

The model accepts text, image, audio, and video input and produces text output. Dynamic thinking is enabled by default, and built-in tool-use capabilities include function calling, structured output, search-as-a-tool, and code execution. Its knowledge cutoff date is January 2026. When paired with Google’s updated Antigravity harness, Gemini 3.5 Flash can spin up multiple subagents working in parallel to tackle long-horizon, multi-step workflows.

Gemini 3.5 Flash benchmarks and technical specs

Gemini 3.5 Flash has a context window of 1,048,576 input tokens (approximately one million) and supports up to 65,536 output tokens. The API model ID is gemini-3.5-flash. Pricing sits at $1.50 per million input tokens, $9.00 per million output tokens, and $0.15 per million cached input tokens. This makes it roughly 40% cheaper than Gemini 3.1 Pro on both input and output, while outperforming it on most practical workloads.

On coding benchmarks, Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1 and 55.1% on SWE-Bench Pro (Public). For multimodal understanding, it reaches 84.2% on CharXiv Reasoning and 83.6% on MMMU-Pro. Where the model trails is on pure abstract reasoning: it scores 40.2% on Humanity’s Last Exam (versus 44.4% for Gemini 3.1 Pro) and 72.1% on ARC-AGI-2 (versus 77.1%). Google positions Gemini 3.5 Pro, expected next month, as the better choice for those knowledge-intensive use cases.

How does Gemini 3.5 Flash compare to GPT-5.5 and Claude Opus 4.7?

Google frames Gemini 3.5 Flash as landing in the top-right quadrant of the Artificial Analysis Intelligence Index, combining frontier-level intelligence with Flash-tier speed. The model runs four times faster in output tokens per second than other frontier models, according to Google’s own measurements. At $1.50/$9.00 per million tokens, it undercuts the pricing of both GPT-5.5 ($5.00/$15.00 per million tokens on OpenAI’s standard tier) and Claude Opus 4.7 ($5.00/$25.00 per million tokens).

The trade-off is clear: Gemini 3.5 Flash gives up ground on dense academic reasoning and long-context recall at the 128k-token range, where Gemini 3.1 Pro still holds an advantage (84.9% vs 77.3% on MRCR v2 at 128k). For workloads centered on multi-step agent execution, tool calling, and coding tasks, Google’s benchmarks suggest Gemini 3.5 Flash is the strongest option in its price class.

Which companies are already using Gemini 3.5 Flash?

Google announced six launch partners with specific production use cases. Shopify uses parallel subagents for merchant growth forecasts. Macquarie Bank is piloting the model for customer onboarding over 100-plus-page financial documents. Salesforce integrates Gemini 3.5 Flash into Agentforce for multi-subagent enterprise task automation. Ramp uses multimodal OCR on complex invoices combined with reasoning over historical patterns. Xero deploys agents for multi-week workflows like 1099 tax form preparation. Databricks uses agentic monitoring and real-time retrieval across large datasets.

Gemini 3.5 Flash also powers Gemini Spark, Google’s new personal AI agent that runs continuously and takes action on a user’s behalf. Spark is rolling out to trusted testers now, with a beta planned for Google AI Ultra subscribers in the US. The model is also the new default for the Gemini app and AI Mode in Google Search globally.

Gemini 3.5 Flash availability and pricing

Gemini 3.5 Flash is generally available as of May 19, 2026. Developers can access it through Google AI Studio, the Gemini API, Android Studio, Google Antigravity 2.0, Vertex AI, and the Gemini Enterprise Agent Platform. Consumer access is available through the Gemini app and AI Mode in Google Search. Pricing is $1.50 per million input tokens and $9.00 per million output tokens on the global tier, with a 90% discount on cached input tokens at $0.15 per million. Non-global regions are priced slightly higher at $1.65/$9.90.Gemini 3.5 Flash is the most significant Flash-tier release Google has shipped.

The full announcement is available on the Google blog, and the evaluation methodology is published at Google Deepmind.