Who this is for

You burn more than $50/mo on OpenRouter and have noticed that 5% markup is starting to show on the invoice. You already have keys for the underlying labs (Anthropic, OpenAI, Gemini). You are okay running a small Docker container on a box you own. You want one endpoint for every model and you want it to be yours.

If you are under $50/mo, stay on OpenRouter — the engineering time to migrate is not worth the arbitrage.

The migration, one diff

diff

- OPENAI_API_BASE=https://openrouter.ai/api/v1
- OPENAI_API_KEY=sk-or-v1-...
+ OPENAI_API_BASE=https://gateway.your-domain.dev/v1
+ OPENAI_API_KEY=ovth_...

That is the whole change for 90% of clients. SDKs, CLIs, IDEs — Cursor, Aider, OpenCode, Continue, Claude Code (via ANTHROPIC_BASE_URL) — all speak this shape.

Tools and versions

A VPS with 1GB RAM, anywhere (Hetzner CX11 is €4/mo, we run one)
Docker 24+ and docker compose
Caddy or any reverse proxy that terminates TLS
OVTH Gateway 0.9.x (Docker image: ghcr.io/nousresearch/ovth-gateway)
Your existing provider keys

Setup in five steps

01. Provision the box

bash

# Hetzner or your VPS of choice
ssh root@your-box
curl -fsSL https://get.docker.com | sh
mkdir -p /srv/ovth && cd /srv/ovth

02. Compose the Gateway

yaml

# /srv/ovth/docker-compose.yml
services:
gateway:
  image: ghcr.io/nousresearch/ovth-gateway:0.9
  restart: always
  ports: ["127.0.0.1:8080:8080"]
  env_file: .env
  volumes:
    - ./data:/data
    - ./config.yaml:/app/config.yaml:ro

yaml

# /srv/ovth/config.yaml
routes:
auto:
  strategy: cheapest-capable
  providers: [anthropic, openai, google]
cheap:
  providers: [google, anthropic]
  models: [gemini-1.5-flash, claude-haiku-4]
smart:
  providers: [anthropic, openai]
  models: [claude-opus-4.7, gpt-5]
quota:
daily_tokens: 500000
per_user: true

bash

# /srv/ovth/.env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
OVTH_MASTER_KEY=$(openssl rand -hex 32)

03. Terminate TLS with Caddy

caddyfile

# /etc/caddy/Caddyfile
gateway.your-domain.dev {
reverse_proxy 127.0.0.1:8080
encode gzip zstd
header {
  Strict-Transport-Security "max-age=31536000"
  X-Frame-Options "DENY"
}
}

bash

docker compose up -d
systemctl reload caddy
curl https://gateway.your-domain.dev/v1/models | jq '.data | length'
# → 28 or so

04. Mint a per-app key and flip your clients

bash

curl -X POST https://gateway.your-domain.dev/admin/keys \
-H "Authorization: Bearer $OVTH_MASTER_KEY" \
-d '{"label":"cursor","quota":{"daily_tokens":100000}}'
# → {"key":"ovth_live_..."}

Rotate this key whenever. One Gateway, many apps, independent quotas.

05. Verify with a drop-in curl against both APIs

bash

# OpenAI shape
curl https://gateway.your-domain.dev/v1/chat/completions \
-H "Authorization: Bearer ovth_live_..." \
-d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

# Anthropic shape — same gateway, different path
curl https://gateway.your-domain.dev/anthropic/v1/messages \
-H "x-api-key: ovth_live_..." \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4.5","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

Both should return a real model response. If they do, you are done. Flip the env vars in your clients.

The 5% math looks clean on paper. In practice it falls apart in three cases, and you should be honest about which one you are in:

You use BYOK already. OpenRouter with BYOK is not 5% — it is $0 plus your underlying bill. In that case the only reason to self-host is multi-tenant routing, quotas, and audit logs. Those are real reasons, but the cost argument is not.
You use exotic models. OpenRouter’s long tail (Together, Fireworks, random Mistral variants) is wider than any single gateway. If you lean on obscure models, you are trading reach for margin.
You do not want to own infra. A VPS is a pager. At 2AM the Gateway goes down and you own the fix. OpenRouter has an SRE team. Sometimes the markup is paying for someone else to wake up.

Self-hosting wins when: you have steady $100+/mo spend, standard model mix (Claude, GPT, Gemini, a local Qwen), and the engineering cycles to own a small service. It loses when you are experimenting, cost-light, or single-tenant.

The real cost comparison: direct Anthropic keys + OVTH Gateway VPS (~$5/mo) vs OpenRouter (5% on top of the same keys). Break-even is ~$100/mo of spend. Below that, OpenRouter is the economically correct answer.

Cost, privacy, performance

OVTH Gateway onboarding — the managed version, no VPS needed
OpenCode with multiple providers — a client that shines behind the Gateway
Kiro as an OpenAI endpoint — a standalone reverse-proxy tutorial for the Kiro IDE backend

Self-hosting is the second derivative of control. The first was moving off ChatGPT Pro. This one is moving off the thing you moved onto. Own the endpoint, own the invoice, own the logs.

Who this is for

The migration, one diff

Tools and versions

Setup in five steps

01. Provision the box

02. Compose the Gateway

03. Terminate TLS with Caddy

04. Mint a per-app key and flip your clients

05. Verify with a drop-in curl against both APIs

Cost, privacy, performance

Related flash tutorials