Local LLM via WebGPU

Run an LLM locally in your browser.

Zero data leaves your device.

FanChat uses WebGPU to run Llama 3.2 1B directly in your browser tab. No API key, no server, no data transmission. Pure on-device inference.

Why local LLM?

Cloud AI is convenient. Local AI is private.

Complete privacy

Your conversations never leave your device. No server receives your messages.

Works offline

Once the model is loaded, no internet connection is required to chat.

No API key

No account, no billing, no rate limits. The model runs entirely in your browser.

Free forever

Local inference has no cost. Use it as much as you want, for as long as you want.

Browser & hardware requirements

WebGPU is available on most modern hardware.

Browser:Chrome 113+ or Edge 113+ (WebGPU required)
GPU:Apple Silicon M1+ or recent discrete GPU (NVIDIA/AMD)
RAM:Minimum 4 GB available GPU memory
WebGPU:Must be enabled (enabled by default on supported hardware)

Supported models

Llama 3.2 1BAvailable now

Meta Llama 3.2 1B Instruct. Quantized for WebGPU. Loads in ~1-2 minutes on supported hardware.

More models coming soon (Llama 3.2 3B, Phi-3.5 Mini, Gemma 2B).

Try it now. No setup needed.

Pick any AI character and select the local model in the provider settings.