Claudium
Operations

Routing AI calls through OpenRouter

Point Claudium's docs-assistant + future AI features at OpenRouter so you can pick any model from 50+ providers with a single API key, unified billing, and easy provider failover.

The /help docs assistant (the in-app surface that composes a grounded answer from your installed docs) can talk to OpenAI directly, or it can route through OpenRouter — an OpenAI-compatible proxy that fans your calls out to any of 50+ underlying models (Anthropic, Google, Mistral, open-source, plus all the OpenAI models). One key, one bill, and you can swap models without touching code.

This guide is for operators of a Claudium hub. End users see no difference — the docs assistant just answers their question.

Why route through OpenRouter

  • Single billing account across every LLM provider you might want to A/B test. No more juggling six dashboards.
  • Model swap is an env var, not a deploy. Try gpt-4o-mini today, claude-3.5-sonnet tomorrow, llama-3.3-70b next week — flip HELP_AI_MODEL and the next request goes there.
  • Provider failover. If one upstream is rate-limited, OpenRouter can fall back to a comparable model automatically (configurable per-account).
  • Spend caps + analytics built in. OpenRouter's dashboard shows per-model cost and lets you cap monthly spend without writing your own guardrail.

Configure the hub

Set two env vars on the hub:

OPENROUTER_API_KEY=sk-or-v1-...your-key-from-openrouter.ai...
HELP_AI_MODEL=openai/gpt-4o-mini

That's it. Restart the hub; the next /help question routes through OpenRouter.

The presence of OPENROUTER_API_KEY is the signal. If you also have OPENAI_API_KEY set, OpenRouter wins — it's the strictly-more-flexible option (you can still route to OpenAI models from there).

Model name format

OpenRouter expects provider/model. Pick from openrouter.ai/models:

Use caseHELP_AI_MODEL
Cheapest, OpenAI quality (default)openai/gpt-4o-mini
Best-in-class reasoninganthropic/claude-3.5-sonnet
Cheapest open-sourcemeta-llama/llama-3.3-70b-instruct
Lowest latencygoogle/gemini-flash-1.5

Tune the cost guardrail

Claudium caps the help assistant's spend at $2/day per process by default. The projection assumes gpt-4o-mini rates ($0.150 / $0.600 per 1M tokens). If you point at a more expensive model, update the rates so the cap stays meaningful:

# Claude 3.5 Sonnet pricing as of 2026 — adjust to actual.
HELP_AI_USD_PER_M_INPUT=3
HELP_AI_USD_PER_M_OUTPUT=15
# Optionally bump the cap too if you want more headroom.
HELP_AI_DAILY_BUDGET_USD=10

The projection is linear in the configured rates, so 10× the rate × 10× the cap leaves your effective request budget unchanged.

Verify it's wired

After restart, ask the docs assistant a question on /help. The response body in the network panel includes a provider field:

{
  "ok": true,
  "answer": "…",
  "model": "openai/gpt-4o-mini",
  "provider": "openrouter"
}

provider: 'openrouter' confirms the routing took effect. Hub logs at boot also print one line:

help-ai: configured  provider=openrouter

Switching back to OpenAI direct

Remove OPENROUTER_API_KEY (or unset it via your platform's env editor) and restart. The hub falls back to OPENAI_API_KEY if set, or to "AI synthesis disabled" if neither is configured. The /help UI handles the disabled state gracefully — keyword search still works.

What's NOT yet routed through OpenRouter

Two AI surfaces still pin to OpenAI direct by env, regardless of OPENROUTER_API_KEY:

  • Embedding generation (lib/embedding-worker.jstext-embedding-3-small). Pattern detection uses this; the call shape is different enough from chat completions that we kept it on the direct path until we've validated OpenRouter's embeddings endpoint at production volume.
  • Insights detection (lib/insight-detect.js). Pure heuristic — no LLM in the loop today.

Both are queued for the next pass when we have OpenRouter usage data from /help to compare against.

On this page