Routing AI calls through OpenRouter
Point Claudium's docs-assistant + future AI features at OpenRouter so you can pick any model from 50+ providers with a single API key, unified billing, and easy provider failover.
The /help docs assistant (the in-app surface that composes a grounded answer from your installed docs) can talk to OpenAI directly, or it can route through OpenRouter — an OpenAI-compatible proxy that fans your calls out to any of 50+ underlying models (Anthropic, Google, Mistral, open-source, plus all the OpenAI models). One key, one bill, and you can swap models without touching code.
This guide is for operators of a Claudium hub. End users see no difference — the docs assistant just answers their question.
Why route through OpenRouter
- Single billing account across every LLM provider you might want to A/B test. No more juggling six dashboards.
- Model swap is an env var, not a deploy. Try gpt-4o-mini today, claude-3.5-sonnet tomorrow, llama-3.3-70b next week — flip
HELP_AI_MODELand the next request goes there. - Provider failover. If one upstream is rate-limited, OpenRouter can fall back to a comparable model automatically (configurable per-account).
- Spend caps + analytics built in. OpenRouter's dashboard shows per-model cost and lets you cap monthly spend without writing your own guardrail.
Configure the hub
Set two env vars on the hub:
OPENROUTER_API_KEY=sk-or-v1-...your-key-from-openrouter.ai...
HELP_AI_MODEL=openai/gpt-4o-mini
That's it. Restart the hub; the next /help question routes through OpenRouter.
The presence of OPENROUTER_API_KEY is the signal. If you also have OPENAI_API_KEY set, OpenRouter wins — it's the strictly-more-flexible option (you can still route to OpenAI models from there).
Model name format
OpenRouter expects provider/model. Pick from openrouter.ai/models:
| Use case | HELP_AI_MODEL |
|---|---|
| Cheapest, OpenAI quality (default) | openai/gpt-4o-mini |
| Best-in-class reasoning | anthropic/claude-3.5-sonnet |
| Cheapest open-source | meta-llama/llama-3.3-70b-instruct |
| Lowest latency | google/gemini-flash-1.5 |
Tune the cost guardrail
Claudium caps the help assistant's spend at $2/day per process by default. The projection assumes gpt-4o-mini rates ($0.150 / $0.600 per 1M tokens). If you point at a more expensive model, update the rates so the cap stays meaningful:
# Claude 3.5 Sonnet pricing as of 2026 — adjust to actual.
HELP_AI_USD_PER_M_INPUT=3
HELP_AI_USD_PER_M_OUTPUT=15
# Optionally bump the cap too if you want more headroom.
HELP_AI_DAILY_BUDGET_USD=10
The projection is linear in the configured rates, so 10× the rate × 10× the cap leaves your effective request budget unchanged.
Verify it's wired
After restart, ask the docs assistant a question on /help. The response body in the network panel includes a provider field:
{
"ok": true,
"answer": "…",
"model": "openai/gpt-4o-mini",
"provider": "openrouter"
}
provider: 'openrouter' confirms the routing took effect. Hub logs at boot also print one line:
help-ai: configured provider=openrouter
Switching back to OpenAI direct
Remove OPENROUTER_API_KEY (or unset it via your platform's env editor) and restart. The hub falls back to OPENAI_API_KEY if set, or to "AI synthesis disabled" if neither is configured. The /help UI handles the disabled state gracefully — keyword search still works.
What's NOT yet routed through OpenRouter
Two AI surfaces still pin to OpenAI direct by env, regardless of OPENROUTER_API_KEY:
- Embedding generation (
lib/embedding-worker.js—text-embedding-3-small). Pattern detection uses this; the call shape is different enough from chat completions that we kept it on the direct path until we've validated OpenRouter's embeddings endpoint at production volume. - Insights detection (
lib/insight-detect.js). Pure heuristic — no LLM in the loop today.
Both are queued for the next pass when we have OpenRouter usage data from /help to compare against.
Custom domain per org
Map a hostname like claudium.acme.com to a workspace. Two TLS recipes — Cloudflare proxy (zero-config) and Caddy reverse proxy (self-hosted).
Privacy posture
Exactly what leaves your machine, exactly what doesn't, and the secret-redaction filter that drops anything risky before it touches a viewer.