Question 1

Where does the AI actually run?

Accepted Answer

Entirely on your own device. PocketLLM uses WebGPU to run a quantized open-source language model inside your browser tab. There is no server and no API key — your messages never leave your computer.

Question 2

Is it really free with no API cost?

Accepted Answer

Yes. Because the model runs on your device's GPU, there is no per-message API fee. The model downloads once (a few hundred MB to ~1 GB depending on size), caches in your browser, and then runs offline at no cost.

Question 3

What do I need to run it?

Accepted Answer

A modern desktop browser with WebGPU — Chrome, Edge or Brave on a reasonably recent computer. Bigger models need more GPU memory. Phones can run the smallest model but desktops are far smoother.

Question 4

How does it compare to ChatGPT?

Accepted Answer

These are small models (0.5–1.5 billion parameters), so they're great for quick questions, drafting, brainstorming and summarising, but not as capable as large cloud models for complex reasoning. The trade-off is total privacy and zero cost.

A private AI chat that runs in your browser

Real AI, no cloud and no cost

What it's good at

Private by design

Frequently asked questions