What is Gemma 3n E4B?
Google's open model that's small enough to run on a phone, capable enough to feel like a real assistant.
May 19, 2026 · 4 min read
The name, decoded
Gemma 3n E4B looks cryptic but reads cleanly once you split it. Gemma is Google DeepMind's family of open-weight models. 4 is the generation. E4B means effective 4 billion parameters — the model behaves like a 4B at inference time while using architectural tricks (sparse activation, smarter attention) to punch above that weight.
The "effective" matters. A traditional 4B model and an E4B model use roughly the same memory and compute, but the E4B variant produces noticeably better answers because not every parameter is active on every token.
Why open weights matter
Most of the well-known assistants — ChatGPT, Gemini, Copilot — run closed models you can only access through someone's API. Gemma 3n E4B is published under an open licence: the weights file is downloadable, inspectable, and runnable on your own hardware.
That changes three things at once:
- Portability. The same weights run on Android, Windows, macOS, and Linux.
- Auditability. Researchers can probe behaviour; bugs and biases can be reported and reproduced.
- Permanence. A model you have on disk cannot be deprecated, rate-limited, or paywalled away.
How it runs on real hardware
A raw 4B-parameter model in full precision is several gigabytes and slow on consumer chips. To make it phone-sized, the weights are quantised to INT4 — four bits per parameter. The file shrinks to roughly ~4 GB and inference speed roughly doubles, with a small quality cost most people won't notice in chat.
Each platform uses its native acceleration stack:
- Android — vendor NNAPI / GPU delegates via TensorFlow Lite.
- Windows —
DirectMLfor GPU/NPU. - macOS —
MLXon Apple Silicon, Metal underneath. - Linux — Vulkan compute or CPU SIMD via llama.cpp-style runtimes.
What you need to run it
Hardware bar, in plain numbers:
- Android. ~6 GB RAM, Android 9+, a chip from the last four years.
- Desktop. Any 64-bit Windows, macOS, or Linux box with 8 GB RAM. More RAM and a recent GPU/NPU help.
Disk: about 4 GB for the model and runtime. Network: only for the one-time download.
What it can and can't do
Gemma 3n E4B is strong at the things you do most: drafting, summarising, explaining, rewriting, basic coding help, reading text in screenshots. It will not replace a frontier 400B model for novel research or massive context windows.
For day-to-day assistant work, the gap is smaller than the marketing implies — and the model is yours, on your device, forever.
More from Localyze.ai: Why on-device AI matters · How to run an LLM on your phone · The cloud AI privacy myth · Download Localyze.ai