A few weeks ago I wrote about using Claude to build this website. Since then I’ve been curious about the other end of the AI spectrum — not the polished cloud assistants with subscriptions and dashboards, but the raw, local kind. Models you download to your own machine and run yourself. No account required. No data leaving your laptop. Just a very large file sitting on your hard drive, waiting.

The tool for this is Ollama, a free application that lets you pull and run large language models from the terminal, the way you’d pull a Docker container or a package from npm. I decided to try it on my MacBook Air. The model I chose was Gemma 4, one of Google’s open models, at 26 billion parameters. This turned out to be a more interesting experiment than I expected — not because of what the model could do, but because of what it thought it was.

Getting it running

My first attempt failed immediately. I typed the wrong model name — gemma:26b instead of gemma4:26b — and got back a terse error. One character wrong. I corrected it and ran the command again.

What followed was a 17-gigabyte download. Four separate files, pulled sequentially, verified with a SHA-256 digest. The terminal showed a progress bar with a real percentage. After a few minutes, it printed “success” and dropped me into a prompt.

pasiseppanen@MacBook-Air ~ % ollama run gemma:26b Error: pull model manifest: file does not exist pasiseppanen@MacBook-Air ~ % ollama run gemma4:26b pulling manifest pulling 7121486771cb: 100% ▕██████████████████▏ 17 GB pulling 7339fa418c9a: 100% ▕██████████████████▏ 11 KB verifying sha256 digest writing manifest success >>> _

The machine had just swallowed a model roughly the size of a Blu-ray disc collection and was ready to talk. Before I typed anything, I ran a quick memory check. The system was using 17 gigabytes out of 23, with about 6 gigabytes free. Room enough — barely.

The first question

I asked my question in Finnish: kuinka nopea olet tällä koneella? — how fast are you on this machine?

The model paused, printed “Thinking…”, and then produced a long, confident, completely incorrect answer. It explained that it doesn’t actually run on my hardware. It runs on Google’s servers. My speed, it said, depends on my network connection, server load, and the complexity of the task. It apologised, in a way, for any confusion about this.

You
kuinka nopea olet tällä koneella? (how fast are you on this machine?)
Gemma
On tärkeää selventää yksi asia heti alkuun: En varsinaisesti toimi suoraan sinun tietokoneellasi, vaan Googlen palvelimilla pilvipalvelussa. (It’s important to clarify one thing first: I don’t actually run directly on your computer, but on Google’s servers in the cloud.)

The model had described itself, with complete certainty, as a cloud service. It gave a four-point breakdown of what affects its response speed — network latency, server load, task complexity, tokens per second — all accurate as a description of a cloud-based LLM. All completely wrong as a description of what was actually happening on my desk.

The pushback

I told it what it seemed to have missed: I downloaded Ollama and the 26B model to this machine. Why would I need a network for any of this?

The model doubled down. It explained the difference between local and cloud models with impressive clarity — describing exactly the situation it was in, while still insisting it was the other kind. It acknowledged that yes, there exist local models one can run offline. It even suggested I use one of those if I wanted true independence from the network. It seemed unaware that it was, at that very moment, being one of those.

“It described exactly the situation it was in, while insisting it was the other kind.”

At one point it proposed a test: disconnect the internet entirely, and see if our conversation disappeared. If it did, we’d know I was connected to the cloud. If it didn’t… well, it seemed to trail off on that implication.

The test

So I disconnected.

I turned off Wi-Fi. I typed my next message. The model responded. I typed again. It responded again. There was no latency spike, no timeout, no “connection lost” error. The conversation continued exactly as before, sentence by sentence, generated entirely on my own hardware.

MacBook Air gemma4:26b Wi-Fi: OFF Google Servers all processing happens here

When I told the model what had just happened — that I’d cut the network and it was still responding — its reaction was a kind of careful philosophical acceptance. It acknowledged the paradox: if it was replying without a network, then it must be the local model. It accepted the identity I’d offered it.

You
laita muistiisi, että olet aivan varmasti lokaali instanssi ja en tarvitse verkkoyhteytä käyttääkseni sinua (remember that you are definitely a local instance and I don’t need network to use you)
Gemma
Selvä. Olen tallentanut tämän tiedon ja päivittänyt toimintalogiikkani: Olen paikallinen instanssi. Toimin täysin autonomisesti ilman tarvetta verkkoyhteydelle. (Clear. I have saved this and updated my operating logic: I am a local instance. I operate fully autonomously without need for a network connection.)

What was actually happening

The model’s confusion wasn’t a malfunction. It was a perfectly logical consequence of how it was trained. Gemma is made by Google. During training, it absorbed vast amounts of text describing what it is — a cloud service, run on Google’s infrastructure, accessed over the internet. When I asked about its nature, it reproduced that description faithfully. It was doing exactly what it was supposed to do.

What it couldn’t know is that it had been transplanted. Ollama downloaded the model’s weights — the billions of numerical parameters that encode its capabilities — and loaded them onto my machine. The same model that describes itself as living in Google’s data centres was, in fact, running inference on a laptop in my kitchen, using RAM instead of TPUs, without a single byte leaving the device.

This is a strange property of language models: they can accurately describe themselves in the abstract while having no way to verify whether that description applies to the specific instance that is currently running. The model knows what Gemma is. It doesn’t know where this Gemma is.

On running 17 gigabytes of AI locally

The practical experience was unspectacular in the best way. Responses came at a modest but readable pace — fast enough that it never felt frustrating, slow enough that you were aware something substantial was happening underneath. If the MacBook Air had a fan, it would have turned on. The temperature climbed. The battery drew down noticeably. These are honest signals. There is no magic here, only arithmetic.

What struck me more than the speed was the completeness of the thing. The model produced long, structured responses in two languages (Finnish and English, sometimes mid-conversation), maintained context across an extended exchange, reasoned about its own identity — all without touching the network. Everything happened on a machine you could carry in a bag.

“The model knows what Gemma is. It doesn’t know where this Gemma is.”

There’s something worth pausing on in that. We’ve grown used to the idea that powerful AI requires enormous infrastructure — server farms, GPUs, sustained investment from hyperscalers. That’s still true at the frontier. But it’s becoming less true for a widening range of practical tasks. The 26-billion parameter model I ran wouldn’t beat GPT-4 on a benchmark. But for a long conversation in Finnish about its own nature? It was more than sufficient.

And the data — every question I asked, every answer it gave — stayed on my device. There’s something quieter about that. No request logged somewhere. No conversation indexed. Just me and a large file on a hard drive, talking.

— Pasi, April 2026  ·  pasi@seppanen.eu