Smart Routing

How sansa-auto picks the best model for each request, including reasoning behavior

Smart routing is what powers sansa-auto. Instead of hard-coding a model, you send your request to sansa-auto and Sansa picks the best underlying model for that specific request — balancing quality and cost — then proxies the call and returns the standard response shape plus a sansa metadata object.

This page covers how to call smart routing, what signals it uses, and — importantly — how reasoning interacts with routing. For the broader comparison of auto-routing vs. calling a specific model directly, see the Models docs.


How to call it

Set the model field to "sansa-auto". There is no separate endpoint or parameter — smart routing runs on the same POST /v1/chat/completions request.

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "Plan a week-long trip to Kyoto."}]
}

Three values all trigger smart routing — they are equivalent:

model valueBehavior
"sansa-auto"Smart route.
nullSmart route.
Omitted entirelySmart route.

Any other value is treated as a direct model call (e.g. "openai/gpt-5.4"), and no routing runs. An unrecognized ID returns 400 invalid_model. See Models for direct calls and provider failover.

See the code panel for full TypeScript, Python, and curl examples.


What the response tells you

A smart-routed response is a normal ChatCompletion, with two things to note:

  • model — the model that actually served the request (e.g. "anthropic/claude-sonnet-4.6"), not "sansa-auto".
  • sansa — routing metadata:
interface SansaCompletionExtension {
  // True for smart-routed (sansa-auto / null / omitted) requests.
  routed: boolean;

  // The model the router selected. Only set when routed is true.
  routed_model: string | null;

  // Routing latency in milliseconds. Only set when routed is true.
  routing_latency_ms: number | null;

  // Total cost for this completion in USD, billed at the served
  // model's per-token rates.
  cost: number | null;
}

When streaming, the sansa object appears on the first chunk (routing fields) and again on the final chunk that carries usage (with cost). See Streaming.


What the router considers

For text requests, the router reads the request and selects a model based on:

  • Message content — the conversation itself is the primary signal for matching a model's capability profile.
  • Tools — if tools are present, only models that support tool calling are eligible. tool_choice is also respected.
  • Reasoning — whether reasoning is requested, and at what effort. See below.
  • Input modalities — image or audio inputs route to a multimodal-capable model automatically (see Multimodal requests).

Among the eligible models, the router selects the best match and breaks ties by cost. You are billed at the per-token rate of whichever model actually served the request.


Reasoning with smart routing

This is the most important behavior to understand: reasoning is opt-in, and it changes which models are eligible.

No reasoning requested → non-reasoning models only

If your request to sansa-auto includes no reasoning object and no reasoning_effort, the router will only route to non-reasoning models. It will not silently send your request to a reasoning model or enable extended thinking on your behalf.

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "What is the capital of France?"}]
}

The request above is guaranteed to land on a non-reasoning model.

Reasoning requested → reasoning models, effort respected

If you provide reasoning — either the reasoning object or the top-level reasoning_effort — the router restricts candidates to reasoning-capable models at the matching tier, and your requested effort is forwarded to the underlying model unchanged.

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
  "reasoning": { "effort": "high" }
}

The two formats are equivalent and either one enables reasoning routing:

{ "reasoning": { "effort": "high" } }
{ "reasoning_effort": "high" }

If both are present, the reasoning object takes precedence. See the Reasoning docs for the full parameter reference, response shape, and billing.

Notes and edge cases

  • "none" and omitting reasoning are equivalent for routing — both keep you on non-reasoning models. The simplest way to guarantee a non-reasoning model is to omit reasoning entirely.
  • reasoning.max_tokens without an effort does not influence model selection (routing treats it as "no reasoning"), but the budget is still forwarded to the model.

Multimodal requests

If any message contains image (image_url) or audio (input_audio) content, the request is routed to a multimodal-capable model automatically — you don't need to do anything beyond including the content. Content-based capability matching is not run for these requests; they're dispatched directly to a model that can handle the modality. See Completions for the input formats.


Failover

Smart routing produces a ranked list of candidate models, not just one. If the selected model's provider is rate-limited, down, or returns a transient error, Sansa automatically retries the next candidate. This is transparent to your client — you receive a single successful response, and usage and cost reflect whichever model actually answered.


Pricing

Smart-routed requests are billed at the per-token rate of the model the router selected (or the fallback model, if failover occurred). Reasoning tokens count as output tokens. Token rates are listed on the Models page. See Models and Completions for the full billing flow.