03/04/2026
Google DeepMind released Gemma 4 on April 2, 2026, a family of four open-weight models built from the same research behind Gemini 3. The models range from 2.3 billion to 30.7 billion parameters and ship under the Apache 2.0 license, making them freely available for commercial and sovereign AI deployments without usage restrictions.
Gemma 4 marks a significant shift from Google’s previous custom licensing approach and introduces native multimodal support across all model sizes, covering text, image, audio, and video inputs.
What Gemma 4 includes
The Gemma 4 family consists of four models: E2B (2.3 billion effective parameters), E4B (4.5 billion effective parameters), a 26B Mixture-of-Experts model with 3.8 billion active parameters, and a 31B dense model with 30.7 billion parameters. The smaller E2B and E4B variants support 128K context windows with text, image, and audio inputs. The larger 26B and 31B models extend to 256K context windows and add video comprehension of up to 60 seconds.
All four models are available in both base and instruction-tuned versions. They support function calling, structured JSON output, and native system instructions, which enables developers to build autonomous agents that interact with external tools and APIs.
Performance improvements over Gemma 3
The benchmark improvements from Gemma 3 to Gemma 4 are substantial. On the AIME 2026 math benchmark, the 31B model scores 89.2%, up from 20.8% with Gemma 3. LiveCodeBench v6 coding performance jumps from 29.1% to 80.0%, and the GPQA science benchmark rises from 42.4% to 84.3%.
The 31B dense model currently ranks as the third-highest open model globally on the Arena AI text leaderboard, with an ELO rating of approximately 1452. The 26B MoE model holds the sixth position. On MMLU Pro, the 31B model achieves 85.2%, while on MMMU Pro for vision tasks it reaches 76.9%.
On-device and edge deployment
Google designed the smaller Gemma 4 models specifically for on-device use. The E2B model runs on a Raspberry Pi 5 at 133 tokens per second for prefill and 7.6 tokens per second for decode, using under 1.5 GB of memory with 2-bit quantization. The E4B model requires 12 to 16 GB and runs well on Apple Silicon devices.
The larger models require more substantial hardware. The 26B MoE model needs around 24 GB (suitable for an A100 GPU), while the 31B dense model requires 40 GB or more, making it best suited for H100 GPUs when running in bf16 precision.
Licensing and availability
Gemma 4 ships under the Apache 2.0 license, the same permissive license used by Qwen 3.5. This is a departure from Google’s previous custom Gemma license and means there are no monthly active user limits, no acceptable-use policy enforcement, and full freedom for commercial use.
The models are available immediately through Hugging Face, Google AI Studio, Ollama, Kaggle, Vertex AI, and Cloud Run. Google has also announced an AICore Developer Preview that brings Gemma 4 to Android devices natively.
Full technical details and model weights are available on the Google DeepMind Gemma 4 page and through the official Google blog announcement.