OVTH / 2026 LIVE · V0.4.0
Ø Overthinking Gateway ↗

Self-host Ollama, point Claude Code at it

Run a 32B model on your own box. Pipe Claude Code at localhost. Zero cloud round-trips.

⚡ 7min read intermediate OS · macos · linux v0.5.4 Last updated May 06, 2026 · by xlrd

Prereq

  • Mac M-series or Linux with 16GB RAM (32GB recommended)
  • NVMe disk with 30GB free
  • Claude Code or OpenCode already installed

Steps

01. Install Ollama

bash
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh

ollama --version  # 0.5.4+

02. Pull a coder model

bash
ollama pull qwen3-coder:32b-q4_K_M   # ~19GB, fast
# or
ollama pull deepseek-coder-v3:14b-q8  # ~15GB, higher precision

03. Serve the OpenAI-compat endpoint

bash
ollama serve  # defaults to 0.0.0.0:11434
# OpenAI-compatible routes live at /v1/*

04. Point Claude Code at it

bash
export ANTHROPIC_BASE_URL="http://localhost:11434/v1"
export ANTHROPIC_API_KEY="ollama"   # ignored, but required
claude --model qwen3-coder:32b-q4_K_M

For OpenCode, set the ollama provider in opencode.json as shown in the multi-provider tutorial.

05. Verify with a real task

bash
cd ~/projects/your-app
claude "read src/index.ts and suggest one optimization"

First response in 2-4s on M3 Max. No cloud, no telemetry, no bill.

Next

  • Self-hosted Ollama on a home server
  • When to graduate from Ollama to vLLM
Feedback · anonymous
Was this helpful?