Google Just Put A.I. Inside Your Phone. Permanently. No Internet Required. And I built an app to prove it.

Jack Lau
4月8日
讀畢需時 5 分鐘

A few days ago, Google quietly released something that I think most people walked past without fully understanding what just happened.

It is called Gemma 4. (https://ai.google.dev/gemma/docs/core/model_card_4)

It is a family of small AI models — and the smallest one, the E2B, is just over 3 gigabytes. Roughly the size of a movie you download for a long flight. Moreover, it can interpret images and support 140 languages.

That may not sound dramatic. But let me tell you what that means.

What Just Changed

Every AI tool you have used up to this point — ChatGPT, Gemini, Claude, Perplexity — works the same way. You type something. It leaves your device. It travels to a server farm somewhere. The AI thinks. The answer comes back. You read it.

That round trip is so fast you never notice it. But it requires three things: an internet connection, a server on the other end, and someone else's computer doing the actual thinking.

And in most cases, it requires a subscription. You pay monthly. You are renting intelligence.

Gemma 4 changes that equation completely.

The thinking happens on your device. The data never leaves. There is no server. There is no internet required. And it is completely free.

That is not a minor optimisation. That is a fundamentally different architecture.

The Secret Weapon Already Inside Your Phone

Here is something most people do not realise.

The latest smartphones — iPhone 15 and above, and recent Android flagships — contain a specialised chip called a Neural Engine. Apple has had it since 2017. It sits quietly inside your phone, purpose-built for one thing: running AI calculations at very high speed, with very low power consumption.

For years this chip was mostly used for things like Face ID, camera processing, and Siri. Useful, but not what it was truly capable of.

Gemma 4 is designed to run on exactly this kind of chip.

When I ran Gemma 4 on my iPhone 17 Pro Max, the responses were fast. Not desktop-fast, but genuinely fast enough to feel like a real conversation. The Neural Engine was doing the work — not a server in Virginia or Singapore. The chip that was already in your pocket, already paid for, already there.

This is why the combination of Gemma 4 and modern smartphones matters so much. The hardware has been ready for a while. Now the software has caught up.

It Is Free. Actually Free.

I want to be very clear about this because the word "free" gets abused a lot in technology.

This is not "free with a premium tier." It is not "free during beta." It is not "free up to X messages per day."

Google released Gemma 4 under an open licence. That means anyone — any developer, any researcher, any curious person — can download it, use it, build with it, at no cost. No API key. No account. No credit card.

Once the model is on your device, there is no ongoing cost. Ever. The AI is yours. It runs whether you have Wi-Fi or not. Whether you have a data plan or not. Whether Google's servers are up or not.

For most of us in Hong Kong and Singapore this feels like a nice-to-have. But think about a student in a rural part of Southeast Asia, or a professional in a country with expensive data. For them, this is not a nice-to-have. This is access to a world-class AI that was previously completely out of reach.

That matters.

Total Privacy. Not as a Feature. As a Fact.

When you use any cloud AI service, your conversation goes to their server. That is simply how it works. Most companies have privacy policies that say they will not misuse your data. Most of them probably mean it.

But "probably" and "policy" are not the same as "impossible."

When AI runs locally on your device, the privacy guarantee is not a policy. It is physics. The data does not travel. There is no server that could be hacked, subpoenaed, or sold. There is no company making decisions about what to do with your questions.

You asked. Your phone answered. Nobody else was in the room.

For anyone asking sensitive questions — about health, about finances, about personal situations — this is not a small thing. This is the difference between a private conversation and one that is technically witnessed.

I Built an App to Test This

When I saw Gemma 4 released, I did what I normally do — I stopped reading about it and started building with it.

Within a few hours, I had Gemma 4 running entirely on my iPhone 17 Pro Max. No cloud. No API key. No backend. The model loads directly into the phone's memory and runs inference locally, token by token, streamed to the screen in real time.

I built a simple chat interface around it. The persona I created is Professor OrdinaryAIJedi — a bilingual scholar who responds in both Traditional Chinese and English, covering science, philosophy, history, and yes, emotional support when needed.

Here is what happens when you ask a question:

You ask: "Where is London?"

Professor OrdinaryAIJedi answers — in Chinese first, then English — with context, a follow-up question to deepen understanding, the whole thing generated on the phone itself.

No Wi-Fi. No SIM data. Nothing leaving the device.

The first time I unplugged the USB cable from my MacBook and watched it keep working — that was the moment.

I genuinely sat there for a few seconds.

How Well Does It Actually Work?

Honestly? Better than I expected for a model this size.

It holds a conversation. It remembers what was said earlier in the session. It switches between Chinese and English fluidly. It explains scientific concepts with real depth. When I tested it with emotional prompts — telling it I was stressed, overwhelmed — it responded with warmth and care before trying to give advice.

Is it as good as GPT-4 or Claude Opus? No. Those are models with hundreds of billions of parameters running on massive server clusters. Gemma 4 E2B has 2 billion parameters running on a phone.

But here is the thing — for most everyday questions, most learning scenarios, most personal support needs, the gap is smaller than you would think. And what you gain in exchange — total privacy, zero cost, zero dependency on internet — is not nothing. For many use cases, it is everything.

Why This Is a Breakthrough

People have been running AI locally on desktop computers for a while now. That is already interesting. But a phone is different.

A phone is always with you. A phone has a camera. A phone has a microphone. A phone is deeply personal in a way a laptop never quite is.

Think about what becomes possible when capable AI lives permanently on that device:

A doctor in a rural clinic with no reliable internet, using an AI assistant that runs entirely offline.

A student anywhere in the world getting a world-class tutor on their phone for free, with no subscription and no data plan required.

A financial professional thinking through sensitive client situations without any of it ever touching a third-party server.

The Bigger Picture

Every few years there is a moment in technology where the pieces quietly assemble into something new. We are in one of those moments right now.

The cloud was not wrong. It was a necessary step. But the assumption that intelligence must live on someone else's server — that your data must travel before your question can be answered, that you must pay a subscription to think — that assumption is now optional.

Google releasing Gemma 4 is not just a product launch. It is a signal that one of the most serious AI labs in the world believes capable intelligence can and should run at the edge. On your device. Under your control. For free.

The chip is already in your pocket. The model is now small enough to fit beside it.

The tools are free. The privacy is total. The potential is limited only by your imagination.

A Blog on Technology and Business

for Our Curious Mind

"E pur si muove"

"And, yet it moves" --- Galileo 1633

Jack Lau