The right model
for every request.

ClawRouter intelligently routes your API requests to the optimal model based on cost, latency, and task complexity. One API key, every LLM, self-learning.

Measured on 60-case labeled eval set · 2026-02-20

87%
Routing accuracy
52 / 60 correct on labeled test set — fast, standard, and reasoning tiers
81%
Cost savings vs direct Opus
Blended across real traffic mix vs routing every call to Opus
18ms
Routing decision latency
p50 judge latency — p95 34ms, p99 48ms. Heuristic fallback <1ms
Response time by tier: fast (Haiku)  ~200ms · standard (Sonnet)  ~1.5s · reasoning (Opus)  ~4s

See it in action

Watch a request get classified and routed through the pipeline in real time.

openclaw — routing demo
Request
Hard Gates
Local Judge
Score + Select
fast
Haiku  $
standard
Sonnet  $$
reasoning
Opus  $$$
1 / 9

The right model for every task

Your coding agent makes hundreds of calls per session. Most don't need the most expensive model.

fast Haiku
> Read file src/utils.ts
> Summarize this git diff
> What does this error mean?
~200ms response $0.001 / call

File reads, summaries, classification, simple Q&A. 60% of agent calls are this simple.

standard Sonnet
> Add error handling to this endpoint
> Write tests for the auth flow
> Refactor this into smaller functions
~1.5s response $0.01 / call

Code generation, test writing, refactoring. The workhorse for most coding tasks.

reasoning Opus
> Design the migration strategy
> Debug this race condition
> Architect the new auth system
~4s response $0.08 / call

Architecture, debugging complex issues, multi-file reasoning. Reserved for when it matters.

Without routing, every call uses the same model. With ClawRouter, you save up to 81% on API costs vs routing everything through Opus — measured on real traffic.

5-stage routing pipeline

Every model: "auto" request passes through an intelligent pipeline that classifies, scores, and learns from outcomes.

1

Hard Gates

Token limits, capability checks, tool-use detection. Sync, <1ms.

2

Local Judge

LLM classifier with circuit breaker + heuristic fallback. Async, <50ms.

3

Score + Select

Policy weights × judge output × EMA stats → best tier. Sync, <1ms.

4

Execute

Routes to selected model via LiteLLM with streaming, failover, and retries.

5

Learn

Fire-and-forget EMA update. Success rates and latency improve future routing.

Routing policies

Control the cost/quality tradeoff with the X-Routing-Policy header. Default: balanced.

cheap
cost
60%
quality
15%
latency
15%
capability
10%

Prefers fast → standard → reasoning

balanced
default
cost
25%
quality
35%
latency
20%
capability
20%

Prefers standard → fast → reasoning

best
cost
5%
quality
60%
latency
10%
capability
25%

Prefers reasoning → standard → fast

low_latency
cost
10%
quality
15%
latency
60%
capability
15%

Prefers fast → standard → reasoning

Model tiers

Pick a tier or let auto decide for you.

Tier Model Best for Relative cost
fast Claude Haiku 4.5 Simple questions, classification, extraction $
standard Claude Sonnet 4.6 Code generation, writing, analysis $$
reasoning Claude Opus 4.6 Complex reasoning, architecture, research $$$
auto Routes intelligently Any task — classified, scored, and routed via the 5-stage pipeline optimal

Why use ClawRouter?

Cost-aware selection

The scoring engine evaluates each tier on fit, cost, latency, and capability. Simple questions go to Haiku, complex reasoning goes to Opus — automatically.

Local judge

An in-cluster LLM classifier analyzes request complexity in <50ms with circuit breaker protection. Falls back to keyword heuristics if the judge is slow or down.

Self-learning

Every response updates per-model EMA statistics. Success rates and latency data continuously improve routing decisions over time.

Privacy controls

Configurable regex redaction scrubs API keys, emails, SSNs, and PEM certs from prompts before the judge sees them. Your data stays private.

Automatic fallbacks

If a provider is down or rate-limited, ClawRouter retries with the next-best model automatically. Your app never sees the failure.

OpenAI-compatible

Drop-in replacement for any OpenAI SDK or client. Just change the base URL and API key — your existing code works as-is.

Start in 60 seconds

Add this to your OpenClaw config file:

{
  "models": {
    "providers": {
      "clawrouter": {
        "baseUrl": "https://api.clawrouter.app/v1",
        "apiKey": "YOUR_API_KEY",
        "api": "openai-completions",
        "models": [
          { "id": "auto", "name": "Smart Route (auto-select best model)" },
          { "id": "fast", "name": "Fast (Haiku 4.5)" },
          { "id": "standard", "name": "Standard (Sonnet 4.6)" },
          { "id": "reasoning", "name": "Reasoning (Opus 4.6)" }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "clawrouter/auto" }
    }
  }
}

Make a request with a routing policy:

curl https://api.clawrouter.app/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Routing-Policy: balanced" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Explain quicksort in one sentence"}
    ]
  }'

# Response headers:
# x-gateway-routed-tier: fast
# x-gateway-judge-source: llm_judge
# x-gateway-judge-confidence: 0.92
# x-gateway-routing-policy: balanced

Works with the standard OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.clawrouter.app/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="auto",  # intelligently routed
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Routing-Policy": "balanced"},
)
print(response.choices[0].message.content)

Works with the standard OpenAI Node.js SDK:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.clawrouter.app/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "auto",  // intelligently routed
  messages: [{ role: "user", content: "Hello" }],
}, { headers: { "X-Routing-Policy": "balanced" } });
console.log(response.choices[0].message.content);

Get started

Enter your email — we'll send you a magic link. No password needed.

Already have an account? Sign in to dashboard →

Simple, transparent pricing

Weekly subscription + pay-per-token. Your allowance is deducted first, then per-token billing kicks in. No surprise invoices.

Spark
Light usage & hobby projects
$5/week
≈ $20 / month
  • $5 weekly allowance
  • 500k tokens / 5hr window
  • 2M tokens / week
  • Intelligent auto-routing
Get Started
🔵
Pulse
Daily driver for developers
$10/week
≈ $40 / month
  • $10 weekly allowance
  • 1M tokens / 5hr window
  • 4M tokens / week
  • Intelligent auto-routing
Get Started
Most Popular
🧠
Cortex
Power users & coding agents
$25/week
≈ $100 / month
  • $25 weekly allowance
  • 2.5M tokens / 5hr window
  • 10M tokens / week
  • Intelligent auto-routing
Get Started
🔥
Apex
Intensive workloads & teams
$50/week
≈ $200 / month
  • $50 weekly allowance
  • 5M tokens / 5hr window
  • 20M tokens / week
  • Intelligent auto-routing
Get Started
81%
avg cost reduction
vs routing every call to Opus — measured on real traffic mix
87%
routing accuracy
correct on labeled eval set across fast, standard, and reasoning tiers
Avg latency by tier
fast (Haiku)~200ms
standard (Sonnet)~1.5s
reasoning (Opus)~4s

All tiers include pay-per-token billing after weekly allowance is used.  Full pricing details →