How to run an LLM on your phone

Real on-device AI on Android — no rooting, no terminal, no waiting for a cloud reply.

May 19, 2026 · 4 min read

What you'll actually get

A chat assistant that lives in your app drawer, answers in plain English, reads images, accepts voice — and never sends any of it to a server. Inference happens on the same chip that runs your camera filters.

Expect roughly conversational speed on a mid-range phone (10-25 tokens per second on Gemma 3n E4B at INT4). Faster on flagships with an NPU. Slower on the very oldest devices.

Hardware that works

Three things matter, in order: RAM, chip generation, storage.

iPhone users: on-device LLMs are coming, but App Store policy and the closed ML stack make the same install flow harder today.

The step-by-step

Using Localyze.ai as the example, but the shape is the same for any well-built on-device app:

That's the whole flow. There is no "connect your API key" step because there is no API.

What to try first

Good prompts to confirm everything is local:

The honest expectations

An on-device model is not a frontier cloud model. It will be slightly less clever at unusual reasoning tasks. It will not browse the web. The first token will sometimes take a second to appear while the model warms up.

In exchange: privacy, offline, one-time payment, no rate limits, no account, no telemetry. For 90% of how people actually use AI — write this, explain that, summarise this email, what's in this photo — the trade is worth it almost every time.


More from Localyze.ai: What is Gemma 3n E4B? · Why on-device AI matters · The cloud AI privacy myth · Download Localyze.ai