Local LLM via WebGPU

Run an LLM locally in your browser.

Zero data leaves your device.

FanChat uses WebGPU to run Llama 3.2 1B directly in your browser tab. No API key, no server, no data transmission. Pure on-device inference.

Why local LLM?

Cloud AI is convenient. Local AI is private.

Your conversations never leave your device. No server receives your messages.

Once the model is loaded, no internet connection is required to chat.

No account, no billing, no rate limits. The model runs entirely in your browser.

Local inference has no cost. Use it as much as you want, for as long as you want.

WebGPU is available on most modern hardware.

Browser:Chrome 113+ or Edge 113+ (WebGPU required)

GPU:Apple Silicon M1+ or recent discrete GPU (NVIDIA/AMD)

RAM:Minimum 4 GB available GPU memory

WebGPU:Must be enabled (enabled by default on supported hardware)

Llama 3.2 1BAvailable now

Meta Llama 3.2 1B Instruct. Quantized for WebGPU. Loads in ~1-2 minutes on supported hardware.

More models coming soon (Llama 3.2 3B, Phi-3.5 Mini, Gemma 2B).

Pick any AI character and select the local model in the provider settings.