A real language model, downloaded once and run entirely on your own device with WebGPU. No server, no API key, no per-message cost — and your conversation stays 100% private.
⚙ checking WebGPU…🔒 fully private💸 zero cost📴 offline after load
Pick a model and press Load. First load downloads it once, then caches.
Load the model above, then say hello 👋
Real AI, no cloud and no cost
PocketLLM loads a compact open-source language model and runs it directly on your computer's graphics hardware through WebGPU. Because the model lives in your browser, there's no server to pay for, no API key to manage, and nothing to leak — every message you type is processed locally and then forgotten when you close the tab.
What it's good at
These small models (0.5–1.5 billion parameters) are quick and genuinely useful for everyday tasks: answering questions, drafting emails and messages, brainstorming, rewriting and summarising text, and explaining concepts. For heavy reasoning or coding you'll still want a large cloud model — but for fast, private help, on-device is hard to beat.
Private by design
Nothing is uploaded and there's no account. Once the model has cached, you can even disconnect from the internet and keep chatting. It's a working demonstration of where browser AI is in 2026: capable models, running for free, on hardware you already own.
Frequently asked questions
Where does the AI run?
Entirely on your own device, via WebGPU, inside the browser tab. No server, no API key — your messages never leave your computer.
Is it really free?
Yes. The model runs on your GPU, so there's no per-message fee. It downloads once, caches, and runs offline thereafter.
What do I need?
A modern desktop browser with WebGPU (Chrome, Edge or Brave) on a reasonably recent computer. Larger models need more GPU memory. The smallest model can run on capable phones.
How does it compare to ChatGPT?
These are small models, so great for quick questions, drafting and summarising, but not as strong as large cloud models on complex reasoning. The upside is total privacy and zero cost.
Note: requires a desktop browser with WebGPU (Chrome/Edge/Brave). First load downloads the model (~350 MB to ~1.1 GB depending on choice); this is a one-time download that then caches. Responses are AI-generated by a small model and can be wrong or made-up — verify anything important. Not for medical, legal or financial advice.