Self-host Ollama, point Claude Code at it
Run a 32B model on your own box. Pipe Claude Code at localhost. Zero cloud round-trips.
⚡ 7min read intermediate OS · macos · linux v0.5.4
Last updated May 06, 2026 · by xlrd
Prereq
- Mac M-series or Linux with 16GB RAM (32GB recommended)
- NVMe disk with 30GB free
- Claude Code or OpenCode already installed
Steps
01. Install Ollama
bash
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama --version # 0.5.4+ 02. Pull a coder model
bash
ollama pull qwen3-coder:32b-q4_K_M # ~19GB, fast
# or
ollama pull deepseek-coder-v3:14b-q8 # ~15GB, higher precision 03. Serve the OpenAI-compat endpoint
bash
ollama serve # defaults to 0.0.0.0:11434
# OpenAI-compatible routes live at /v1/* 04. Point Claude Code at it
bash
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"
export ANTHROPIC_API_KEY="ollama" # ignored, but required
claude --model qwen3-coder:32b-q4_K_M For OpenCode, set the ollama provider in opencode.json as shown in the multi-provider tutorial.
05. Verify with a real task
bash
cd ~/projects/your-app
claude "read src/index.ts and suggest one optimization" First response in 2-4s on M3 Max. No cloud, no telemetry, no bill.
Next
- Self-hosted Ollama on a home server
- When to graduate from Ollama to vLLM
Feedback · anonymous
Was this helpful?