Google released Gemma 4 on April 2, the latest generation of its open-weights model family. The update brings multimodal capabilities (text, image, video, and audio input on smaller variants), native function calling, support for over 140 languages, and a significant licensing change: Gemma 4 ships under Apache 2.0, replacing the more restrictive Gemma license that governed previous versions.
The Apache 2.0 switch is arguably the most consequential change. It removes the commercial use restrictions that kept some enterprises from adopting Gemma in production, and puts the model family on equal licensing footing with Meta’s Llama and other permissively-licensed open models.
The Model Family
Gemma 4 ships in four variants spanning edge devices to server-grade deployments:
| Model | Parameters | Context | Modalities | Target |
|---|---|---|---|---|
| Gemma 4 E2B | ~2B (effective) | 128K | Text, image, video, audio | Mobile / edge |
| Gemma 4 E4B | ~4B (effective) | 128K | Text, image, video, audio | Mobile / edge |
| Gemma 4 26B | 26B | 128K+ | Text, image, video | Server |
| Gemma 4 31B | 31B | 128K+ | Text, image, video | Server |
The E2B and E4B designations refer to effective parameter counts — these are mixture-of-experts models where the total parameter count is higher but the active parameters per inference pass are in the 2B and 4B range. This matters for deployment: the models can run on mobile hardware and edge devices with computational budgets that would exclude a dense 26B model.
All variants natively process images and video. The two edge models additionally support audio input, making them capable of on-device speech recognition without a separate transcription pipeline. The larger models focus on reasoning-intensive tasks where audio processing can be handled upstream.
What Changed from Gemma 3
The headline improvements are in reasoning and agentic capabilities. Google specifically optimized Gemma 4 for math, instruction following, and tool use — the capabilities that matter most for autonomous AI workflows where a model needs to plan multi-step actions and execute them through function calls.
Native function calling is a meaningful addition for the open-weights ecosystem. Previously, getting reliable tool use from open models required significant prompt engineering or fine-tuning. Gemma 4 supports it out of the box, which lowers the barrier for developers building agentic applications on open infrastructure rather than relying on proprietary API providers.
The 140+ language support is another practical upgrade. Gemma 3 supported a narrower set of languages, which limited adoption in non-English markets. For enterprises operating across geographies, multilingual capability in an open model reduces the need for separate language-specific deployments.
The Apache 2.0 Decision
Previous Gemma releases used a custom license that imposed restrictions on commercial use above certain user thresholds and required adherence to specific acceptable use policies. These restrictions were mild compared to some alternatives, but they created enough legal uncertainty that enterprise procurement teams — who tend toward conservative license interpretations — sometimes opted for Apache 2.0-licensed alternatives instead.
The switch to Apache 2.0 eliminates that friction. Organizations can now deploy, modify, and distribute Gemma 4 models with the same licensing clarity as any other Apache-licensed open source project. No user thresholds, no acceptable use review, no custom terms.
This positions Gemma 4 directly against Meta’s Llama family, which uses a permissive but custom license, and Alibaba’s Qwen models, which have gained significant traction in markets where open-weight alternatives to proprietary models are preferred. For Google, the calculus is straightforward: wider adoption of Gemma models grows the ecosystem around Google’s AI tooling, which drives usage of Google Cloud infrastructure.
Where Gemma 4 Fits
The competitive landscape for open models has intensified considerably, and the proprietary side is moving just as fast — Microsoft launched three in-house multimodal models the same day Gemma 4 dropped. Qwen 3.5 models have established strong positions, particularly in multilingual and reasoning benchmarks. Llama 4 continues to dominate mindshare in the English-speaking developer community. DeepSeek’s models have pushed cost-efficiency boundaries.
Gemma 4’s strengths are in multimodal flexibility and the edge deployment story. The ability to run a model with image, video, and audio understanding on a mobile device — with native function calling — opens use cases that server-only models cannot address. Real-time on-device translation, visual question answering without cloud round-trips, and local voice interfaces become feasible with the E2B and E4B variants.
For server-side deployments, the 26B and 31B models compete on reasoning quality and tool use reliability. Early benchmarks suggest they are competitive with comparably-sized models from Meta and Alibaba, though the open model space moves fast enough that benchmark leadership is measured in weeks rather than quarters.
Availability
Gemma 4 is available through Google AI Studio, Hugging Face, Kaggle, Ollama, and the AI Edge Gallery for on-device deployment. NVIDIA has announced optimized versions for RTX hardware, which should improve local inference performance on consumer GPUs.
For organizations already running local AI inference or evaluating it, Gemma 4 represents a meaningful upgrade in what is possible without sending data to a cloud API. The combination of multimodal input, function calling, permissive licensing, and edge-viable model sizes makes it one of the most deployment-flexible open model families available.
