Completions

API reference for the chat completions endpoint

Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa auto-routes requests to the best underlying model based on conversation content, tools, and reasoning configuration.

POST /v1/chat/completions

See the code panel for request and response examples.


Body Parameters

messages Message required

An array of messages comprising the conversation so far. Sansa supports text content only. Image URLs and audio inputs return a #400 error with code "unsupported_modality".

role string required

One of "system", "user", "assistant", "tool".

{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "..."}
{"role": "tool", "content": "..."}

content string | ContentPart[] | null

The text content of the message. Can be a plain string or an array of content parts. Only type: "text" content parts are accepted type: "image_url" returns a #400 error. Set to null on assistant messages that contain tool_calls instead of text.

// String content
{"role": "user", "content": "Hello"}

// ContentPart array (text only)
{"role": "user", "content": [{"type": "text", "text": "Hello"}]}

// null when assistant uses tools
{"role": "assistant", "content": null, "tool_calls": [...]}

tool_calls ToolCall[] | null assistant messages only

An array of tool calls the model wants to make. Each entry contains an id, type: "function", and a function object with name and arguments (a JSON string). Present when the model decides to call one or more tools instead of generating text content.

{
  "role": "assistant",
  "content": null,
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"Tokyo\"}"
    }
  }]
}

tool_call_id string | null tool messages only

Must match an id from a prior tool_calls entry. This is how the model associates a tool result with the call that requested it.

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temp\": 22, \"unit\": \"celsius\"}"
}

reasoning string | null assistant messages only

The model's reasoning text, if reasoning was enabled and not excluded. Used when passing assistant messages back for multi-turn conversations that include reasoning.

reasoning_details ReasoningDetail[] | null assistant messages only

Structured reasoning blocks returned by reasoning-capable models. These should be passed back verbatim in subsequent requests to maintain reasoning continuity across turns.


model string | null

Must be "sansa-auto" or null (omitted). Any other value returns a #400 with code invalid_model. Sansa auto-routes to the best underlying model. You do not choose the model directly.


stream boolean default: false

If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk. The stream_options parameter is accepted but ignored.

See Streaming for the full streaming reference.


temperature number default: 1.0

Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default. Override only when you have a specific reason.


max_tokens integer optional deprecated

Maximum number of tokens to generate. Must be at least 1 if provided. Deprecated by OpenAI in favor of max_completion_tokens. Still accepted by Sansa. If both are set, max_completion_tokens takes precedence.


max_completion_tokens integer optional

Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided.

Takes precedence over max_tokens when both are set.


tools array of Tool optional

A list of tools the model may call. Each tool has type: "function" and a function object containing name, description, and parameters (a JSON Schema object).

See Tools for the full tool call lifecycle.


tool_choice string | object default: "auto"

Controls which tool (if any) the model calls.

  • "auto" -- model decides whether to call a tool or generate text. Default when tools is present.
  • "none" -- model will not call any tool. Default when no tools are provided.
  • "required" -- model must call at least one tool.
  • {"type": "function", "function": {"name": "my_function"}} -- forces the model to call the named function.

parallel_tool_calls boolean default: true

Whether to allow the model to make multiple tool calls in a single response. When true, the model can return multiple entries in the tool_calls array.


response_format object optional

Forces the model to produce output in a specific format.

  • {"type": "text"} -- default. Unstructured text output.
  • {"type": "json_object"} -- model output is guaranteed to be valid JSON.
  • {"type": "json_schema", "json_schema": {"name": "...", "strict": true, "schema": {...}}} -- structured outputs. The response conforms to the provided JSON Schema.

json_schema is preferred over json_object when the underlying model supports it.


reasoning object optional

Reasoning (extended thinking) configuration, following the OpenRouter convention.

FieldTypeDescription
effortstringOne of "xhigh", "high", "medium", "low", "minimal", "none".
max_tokensintegerDirect token budget for reasoning.
excludebooleanIf true, reasoning is used internally but not returned in the response.

If both reasoning and reasoning_effort are provided, reasoning takes precedence.

The router uses this as an input to model selection. The final reasoning effort applied may differ from what you requested.

See Reasoning for full details.


reasoning_effort string optional

Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".

Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.


top_p number optional passed through

Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.

Forwarded directly to the underlying model. Not used by Sansa's routing logic. Altering both top_p and temperature simultaneously is generally not recommended.


stop string | array optional passed through

Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.

Forwarded directly to the underlying model.


frequency_penalty number optional passed through

Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.

Forwarded directly to the underlying model.


presence_penalty number optional passed through

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.

Forwarded directly to the underlying model.


metadata object optional

A set of up to 16 key-value pairs that can be attached to the request. Keys: max 64 characters. Values: max 512 characters.

The key call_name is special: its value labels the request in the Sansa dashboard. call_name must be 64 characters or less and cannot be empty or whitespace-only. There is no top-level call_name parameter -- it lives exclusively inside metadata.


n integer default: 1 not supported

Number of completions to generate. Only n=1 is accepted. Requests with n > 1 return a #400 error with code unsupported_parameter.


audio object not supported

Audio output configuration. Sansa is text-only. Providing this parameter returns a #400 error with code unsupported_parameter.


modalities array not supported

Output modalities. Only ["text"] is accepted. Including "audio" returns a #400 error with code unsupported_parameter.


web_search_options object not supported

Web search configuration. Not supported. Returns a #400 error with code unsupported_parameter.


functions array deprecated not supported

Legacy function definitions. Replaced by tools. Returns a #400 error with code unsupported_parameter.


function_call string | object deprecated not supported

Legacy function call control. Replaced by tool_choice. Returns a #400 error with code unsupported_parameter.


logit_bias object ignored

Accepted for compatibility. Silently ignored.


logprobs boolean ignored

Accepted for compatibility. Silently ignored.


top_logprobs integer ignored

Accepted for compatibility. Silently ignored.


seed integer ignored

Accepted for compatibility. Silently ignored.


stream_options object ignored

Accepted for compatibility. Silently ignored. Sansa always includes usage data in the final streaming chunk regardless of this setting.


prediction object ignored

Accepted for compatibility. Silently ignored.


store boolean ignored

Accepted for compatibility. Silently ignored.


service_tier string ignored

Accepted for compatibility. Silently ignored.


prompt_cache_key string ignored

Accepted for compatibility. Silently ignored.


prompt_cache_retention string ignored

Accepted for compatibility. Silently ignored.


safety_identifier string ignored

Accepted for compatibility. Silently ignored.


user string ignored

Accepted for compatibility. Silently ignored.


verbosity string ignored

Accepted for compatibility. Silently ignored.


Returns

A ChatCompletion object.

FieldTypeDescription
idstringUnique identifier for the completion.
objectstringAlways "chat.completion".
createdintegerUnix timestamp of when the completion was created.
modelstringAlways returns "sansa-auto".
choicesarrayCompletion choices. Always length 1.
usageobjectToken usage statistics.

choices[].message

FieldTypeDescription
rolestringAlways "assistant".
contentstring | nullText content. null when tool_calls is present.
tool_callsarray | nullTool calls the model wants to make, if any.
reasoningstring | nullReasoning text, if reasoning was enabled and not excluded.
reasoning_detailsarray | nullStructured reasoning blocks.

choices[].finish_reason

ValueMeaning
stopNatural completion or stop sequence hit.
lengthHit max_tokens / max_completion_tokens limit.
tool_callsModel wants to call one or more tools.
content_filterContent was filtered by the underlying provider.
errorError during generation.

usage

FieldTypeDescription
prompt_tokensintegerNumber of tokens in the input.
completion_tokensintegerNumber of tokens in the output.
total_tokensintegerSum of prompt_tokens and completion_tokens.

Errors

All errors follow this format:

{
  "error": {
    "code": "error_code",
    "message": "Human-readable description."
  }
}
StatusCodeCause
#400invalid_modelmodel is not "sansa-auto" or null.
#400invalid_call_namecall_name is empty, whitespace-only, or exceeds 64 characters.
#400unsupported_modalityImage or audio content detected in messages.
#400unsupported_parametern > 1, audio, web_search_options, functions, or function_call provided.
#401unauthorizedInvalid or missing API key.
#402insufficient_creditsAccount balance is zero or negative.
#429rate_limit_exceededToo many requests.
#500internal_errorUnexpected server error.

Billing

  • Credits are checked before each request.
  • Cost is estimated before the request and reserved from your balance.
  • After the response completes, cost is recalculated using actual token usage and the difference is refunded or deducted.
  • The usage field in the response reflects actual token consumption.
  • Pricing is per-million tokens, billed separately for input and output.
  • Failed requests are not charged.