Documentation
Everything you need to integrate Kestrel and start saving.
Quickstart
Get started in 3 steps. No SDK to install, no code migration.
1. Create your API key
Sign in to the dashboard with GitHub or Google. Go to the API Keys tab, enter your provider API keys (OpenAI, Anthropic, etc.), and generate a Kestrel key.
2. Change your base URL
Replace your provider's base URL with Kestrel's. Your existing code stays the same.
client = OpenAI(api_key="sk-proj-...")
client = OpenAI(
base_url="https://api.usekestrel.io/v1",
api_key="ks-your-key",
)
3. Send requests as normal
Every request is analyzed and routed to the cheapest model that can handle it. Simple prompts go to economy models, complex ones stay on premium.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
That's it. View your savings in real-time on the dashboard.
How Routing Works
Every request goes through a 3-stage pipeline in under 2ms:
- Analyze — Extract features from the request: message length, keywords, tools, system prompt complexity, conversation depth
- Classify — Score the request across 5 dimensions (reasoning, output complexity, domain specificity, instruction nuance, error tolerance) and assign a tier: Economy, Standard, or Premium
- Route — Select the cheapest available model in the assigned tier from your configured providers
The semantic cache adds a Stage 0: if a similar request was recently answered, the cached response is returned instantly with zero provider cost.
Tier examples
- Economy — "What is 2+2?", "Say hello", simple translations → gpt-4o-mini, gemini-2.5-flash
- Standard — Moderate analysis, comparisons, summaries → gpt-4o-mini, claude-haiku
- Premium — Complex architecture design, multi-step reasoning with code, domain expertise → gpt-4o, claude-sonnet, grok-3
SDK Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.usekestrel.io/v1",
api_key="ks-your-key",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(response.choices[0].message.content)
JavaScript/TypeScript (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.usekestrel.io/v1",
apiKey: "ks-your-key",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Explain quantum computing" }],
});
console.log(response.choices[0].message.content);
cURL
curl https://api.usekestrel.io/v1/chat/completions \
-H "Authorization: Bearer ks-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
base_url="https://api.usekestrel.io/v1",
api_key="ks-your-key",
)
response = llm.invoke("Explain quantum computing")
print(response.content)
API Reference
Kestrel is fully OpenAI API-compatible. Any SDK or tool that works with OpenAI works with Kestrel.
Base URL
https://api.usekestrel.io/v1
Authentication
Pass your Kestrel API key in the Authorization header:
Authorization: Bearer ks-your-key
Endpoints
POST
/v1/chat/completions
Send a chat completion request. Supports streaming.
GET
/health
Health check. Returns {"status": "ok"}.
GET
/api/dashboard/savings
Usage and savings summary for your API key.
GET
/api/dashboard/routing
Routing tier distribution (economy/standard/premium/cache).
GET
/api/dashboard/analytics
Per-model usage analytics.
Request format
Standard OpenAI chat completions format. All fields are supported:
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "..."}],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false,
"tools": [...],
"response_format": {"type": "json_object"}
}
Dashboard
The dashboard shows real-time analytics:
- Savings — total requests, baseline cost, actual cost, and savings percentage
- Routing — breakdown by tier (economy, standard, premium, cache)
- Provider health — status of each connected provider
- Model analytics — per-model request counts, tokens, and costs
- API keys — create, list, and revoke keys with provider credentials
- CSV export — download usage data for your own analysis
Supported Providers
Kestrel routes across all major LLM providers. Add as many as you want when creating your API key:
- OpenAI — GPT-4o, GPT-4o-mini
- Anthropic — Claude Sonnet 4.6, Claude Haiku 4.5
- Google — Gemini 2.5 Pro, Gemini 2.5 Flash
- Groq — Llama 3.1 8B, Llama 3.1 70B (ultra-fast inference)
- xAI — Grok-3, Grok-3-mini
- Mistral — Mistral Large, Mistral Small
- Cohere — Command R+, Command R
- Together AI — Open-source model hosting
The more providers you add, the more routing options Kestrel has to find savings.
Security
- Encryption at rest — Provider API keys are encrypted with AES-256 (Fernet) before storage
- HTTPS everywhere — All traffic encrypted in transit via TLS
- No prompt logging — We do not store or log your prompt content or model responses
- Tenant isolation — Each customer's cache and data is fully isolated
- Revocable keys — Revoke any API key instantly from the dashboard
- Open source core — The routing engine is open source and auditable
FAQ
Will my responses be different?
For simple requests routed to cheaper models, responses may differ slightly in style but not in correctness. Complex requests stay on premium models and produce identical results. You can always set a tier floor to prevent downgrading below a certain level.
What if a cheaper model gives a bad response?
The routing classifier is conservative — it only downgrades when confident the cheaper model can handle it. Over time, the system learns from outcome signals and improves its routing decisions.
Does Kestrel add latency?
The routing classification takes less than 2ms. Semantic cache hits are near-instant. Total added latency is negligible compared to LLM inference time.
Can I force a specific model?
Yes. If you request a model that Kestrel recognizes as economy-tier (like gpt-4o-mini), it won't route up to a more expensive model. The system only routes cheaper, never more expensive.
How does billing work?
You pay 15% of the savings Kestrel generates. If your baseline cost would have been $1,000 and Kestrel reduces it to $400, you pay 15% of the $600 saved = $90. Your total cost: $490 instead of $1,000. If savings are $0, you pay $0.
Is Kestrel open source?
The core routing engine is open source at github.com/andber6/kestrel. The managed service (billing, caching, dashboard) is the commercial product.