Why local LLM?
Cloud AI is convenient. Local AI is private.
Complete privacy
Your conversations never leave your device. No server receives your messages.
Works offline
Once the model is loaded, no internet connection is required to chat.
No API key
No account, no billing, no rate limits. The model runs entirely in your browser.
Free forever
Local inference has no cost. Use it as much as you want, for as long as you want.
Browser & hardware requirements
WebGPU is available on most modern hardware.
Browser:Chrome 113+ or Edge 113+ (WebGPU required)
GPU:Apple Silicon M1+ or recent discrete GPU (NVIDIA/AMD)
RAM:Minimum 4 GB available GPU memory
WebGPU:Must be enabled (enabled by default on supported hardware)
Supported models
Llama 3.2 1BAvailable now
Meta Llama 3.2 1B Instruct. Quantized for WebGPU. Loads in ~1-2 minutes on supported hardware.
More models coming soon (Llama 3.2 3B, Phi-3.5 Mini, Gemma 2B).