Completions
API reference for the chat completions endpoint
Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa auto-routes requests to the best underlying model based on conversation content, tools, and reasoning configuration.
POST /v1/chat/completions
See the code panel for request and response examples.
Body Parameters
messages Message required
An array of messages comprising the conversation so far. Sansa supports text content only. Image URLs and audio inputs return a #400 error with code "unsupported_modality".
role string required
One of "system", "user", "assistant", "tool".
{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "..."}
{"role": "tool", "content": "..."}content string | ContentPart[] | null
The text content of the message. Can be a plain string or an array of content parts. Only type: "text" content parts are accepted type: "image_url" returns a #400 error. Set to null on assistant messages that contain tool_calls instead of text.
// String content
{"role": "user", "content": "Hello"}
// ContentPart array (text only)
{"role": "user", "content": [{"type": "text", "text": "Hello"}]}
// null when assistant uses tools
{"role": "assistant", "content": null, "tool_calls": [...]}tool_calls ToolCall[] | null assistant messages only
An array of tool calls the model wants to make. Each entry contains an id, type: "function", and a function object with name and arguments (a JSON string). Present when the model decides to call one or more tools instead of generating text content.
{
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"Tokyo\"}"
}
}]
}tool_call_id string | null tool messages only
Must match an id from a prior tool_calls entry. This is how the model associates a tool result with the call that requested it.
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temp\": 22, \"unit\": \"celsius\"}"
}reasoning string | null assistant messages only
The model's reasoning text, if reasoning was enabled and not excluded. Used when passing assistant messages back for multi-turn conversations that include reasoning.
reasoning_details ReasoningDetail[] | null assistant messages only
Structured reasoning blocks returned by reasoning-capable models. These should be passed back verbatim in subsequent requests to maintain reasoning continuity across turns.
model string | null
Must be "sansa-auto" or null (omitted). Any other value returns a #400 with code invalid_model. Sansa auto-routes to the best underlying model. You do not choose the model directly.
stream boolean default: false
If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk. The stream_options parameter is accepted but ignored.
See Streaming for the full streaming reference.
temperature number default: 1.0
Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default. Override only when you have a specific reason.
max_tokens integer optional deprecated
Maximum number of tokens to generate. Must be at least 1 if provided. Deprecated by OpenAI in favor of max_completion_tokens. Still accepted by Sansa. If both are set, max_completion_tokens takes precedence.
max_completion_tokens integer optional
Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided.
Takes precedence over max_tokens when both are set.
tools array of Tool optional
A list of tools the model may call. Each tool has type: "function" and a function object containing name, description, and parameters (a JSON Schema object).
See Tools for the full tool call lifecycle.
tool_choice string | object default: "auto"
Controls which tool (if any) the model calls.
"auto"-- model decides whether to call a tool or generate text. Default whentoolsis present."none"-- model will not call any tool. Default when notoolsare provided."required"-- model must call at least one tool.{"type": "function", "function": {"name": "my_function"}}-- forces the model to call the named function.
parallel_tool_calls boolean default: true
Whether to allow the model to make multiple tool calls in a single response. When true, the model can return multiple entries in the tool_calls array.
response_format object optional
Forces the model to produce output in a specific format.
{"type": "text"}-- default. Unstructured text output.{"type": "json_object"}-- model output is guaranteed to be valid JSON.{"type": "json_schema", "json_schema": {"name": "...", "strict": true, "schema": {...}}}-- structured outputs. The response conforms to the provided JSON Schema.
json_schema is preferred over json_object when the underlying model supports it.
reasoning object optional
Reasoning (extended thinking) configuration, following the OpenRouter convention.
| Field | Type | Description |
|---|---|---|
effort | string | One of "xhigh", "high", "medium", "low", "minimal", "none". |
max_tokens | integer | Direct token budget for reasoning. |
exclude | boolean | If true, reasoning is used internally but not returned in the response. |
If both reasoning and reasoning_effort are provided, reasoning takes precedence.
The router uses this as an input to model selection. The final reasoning effort applied may differ from what you requested.
See Reasoning for full details.
reasoning_effort string optional
Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".
Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.
top_p number optional passed through
Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.
Forwarded directly to the underlying model. Not used by Sansa's routing logic. Altering both top_p and temperature simultaneously is generally not recommended.
stop string | array optional passed through
Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.
Forwarded directly to the underlying model.
frequency_penalty number optional passed through
Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.
Forwarded directly to the underlying model.
presence_penalty number optional passed through
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.
Forwarded directly to the underlying model.
metadata object optional
A set of up to 16 key-value pairs that can be attached to the request. Keys: max 64 characters. Values: max 512 characters.
The key call_name is special: its value labels the request in the Sansa dashboard. call_name must be 64 characters or less and cannot be empty or whitespace-only. There is no top-level call_name parameter -- it lives exclusively inside metadata.
n integer default: 1 not supported
Number of completions to generate. Only n=1 is accepted. Requests with n > 1 return a #400 error with code unsupported_parameter.
audio object not supported
Audio output configuration. Sansa is text-only. Providing this parameter returns a #400 error with code unsupported_parameter.
modalities array not supported
Output modalities. Only ["text"] is accepted. Including "audio" returns a #400 error with code unsupported_parameter.
web_search_options object not supported
Web search configuration. Not supported. Returns a #400 error with code unsupported_parameter.
functions array deprecated not supported
Legacy function definitions. Replaced by tools. Returns a #400 error with code unsupported_parameter.
function_call string | object deprecated not supported
Legacy function call control. Replaced by tool_choice. Returns a #400 error with code unsupported_parameter.
logit_bias object ignored
Accepted for compatibility. Silently ignored.
logprobs boolean ignored
Accepted for compatibility. Silently ignored.
top_logprobs integer ignored
Accepted for compatibility. Silently ignored.
seed integer ignored
Accepted for compatibility. Silently ignored.
stream_options object ignored
Accepted for compatibility. Silently ignored. Sansa always includes usage data in the final streaming chunk regardless of this setting.
prediction object ignored
Accepted for compatibility. Silently ignored.
store boolean ignored
Accepted for compatibility. Silently ignored.
service_tier string ignored
Accepted for compatibility. Silently ignored.
prompt_cache_key string ignored
Accepted for compatibility. Silently ignored.
prompt_cache_retention string ignored
Accepted for compatibility. Silently ignored.
safety_identifier string ignored
Accepted for compatibility. Silently ignored.
user string ignored
Accepted for compatibility. Silently ignored.
verbosity string ignored
Accepted for compatibility. Silently ignored.
Returns
A ChatCompletion object.
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion. |
object | string | Always "chat.completion". |
created | integer | Unix timestamp of when the completion was created. |
model | string | Always returns "sansa-auto". |
choices | array | Completion choices. Always length 1. |
usage | object | Token usage statistics. |
choices[].message
| Field | Type | Description |
|---|---|---|
role | string | Always "assistant". |
content | string | null | Text content. null when tool_calls is present. |
tool_calls | array | null | Tool calls the model wants to make, if any. |
reasoning | string | null | Reasoning text, if reasoning was enabled and not excluded. |
reasoning_details | array | null | Structured reasoning blocks. |
choices[].finish_reason
| Value | Meaning |
|---|---|
stop | Natural completion or stop sequence hit. |
length | Hit max_tokens / max_completion_tokens limit. |
tool_calls | Model wants to call one or more tools. |
content_filter | Content was filtered by the underlying provider. |
error | Error during generation. |
usage
| Field | Type | Description |
|---|---|---|
prompt_tokens | integer | Number of tokens in the input. |
completion_tokens | integer | Number of tokens in the output. |
total_tokens | integer | Sum of prompt_tokens and completion_tokens. |
Errors
All errors follow this format:
{
"error": {
"code": "error_code",
"message": "Human-readable description."
}
}| Status | Code | Cause |
|---|---|---|
#400 | invalid_model | model is not "sansa-auto" or null. |
#400 | invalid_call_name | call_name is empty, whitespace-only, or exceeds 64 characters. |
#400 | unsupported_modality | Image or audio content detected in messages. |
#400 | unsupported_parameter | n > 1, audio, web_search_options, functions, or function_call provided. |
#401 | unauthorized | Invalid or missing API key. |
#402 | insufficient_credits | Account balance is zero or negative. |
#429 | rate_limit_exceeded | Too many requests. |
#500 | internal_error | Unexpected server error. |
Billing
- Credits are checked before each request.
- Cost is estimated before the request and reserved from your balance.
- After the response completes, cost is recalculated using actual token usage and the difference is refunded or deducted.
- The
usagefield in the response reflects actual token consumption. - Pricing is per-million tokens, billed separately for input and output.
- Failed requests are not charged.