Completions

API reference for the chat completions endpoint

Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa auto-routes requests to the best underlying model based on conversation content, tools, and reasoning configuration.

POST /v1/chat/completions

See the code panel for request and response examples.

Body Parameters

messages `Message` required

An array of messages comprising the conversation so far. Sansa supports text content only. Image URLs and audio inputs return a #400 error with code "unsupported_modality".

role `string` required

One of "system", "user", "assistant", "tool".

{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "..."}
{"role": "tool", "content": "..."}

content `string | ContentPart[] | null`

The text content of the message. Can be a plain string or an array of content parts. Only type: "text" content parts are accepted type: "image_url" returns a #400 error. Set to null on assistant messages that contain tool_calls instead of text.

// String content
{"role": "user", "content": "Hello"}

// ContentPart array (text only)
{"role": "user", "content": [{"type": "text", "text": "Hello"}]}

// null when assistant uses tools
{"role": "assistant", "content": null, "tool_calls": [...]}

tool_calls `ToolCall[] | null` assistant messages only

An array of tool calls the model wants to make. Each entry contains an id, type: "function", and a function object with name and arguments (a JSON string). Present when the model decides to call one or more tools instead of generating text content.

{
  "role": "assistant",
  "content": null,
  "tool_calls": [{
    "id": "call_abc123",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"Tokyo\"}"
    }
  }]
}

tool_call_id `string | null` tool messages only

Must match an id from a prior tool_calls entry. This is how the model associates a tool result with the call that requested it.

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temp\": 22, \"unit\": \"celsius\"}"
}

reasoning `string | null` assistant messages only

The model's reasoning text, if reasoning was enabled and not excluded. Used when passing assistant messages back for multi-turn conversations that include reasoning.

reasoning_details `ReasoningDetail[] | null` assistant messages only

Structured reasoning blocks returned by reasoning-capable models. These should be passed back verbatim in subsequent requests to maintain reasoning continuity across turns.

model `string | null`

Must be "sansa-auto" or null (omitted). Any other value returns a #400 with code invalid_model. Sansa auto-routes to the best underlying model. You do not choose the model directly.

stream `boolean` default: `false`

If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk. The stream_options parameter is accepted but ignored.

See Streaming for the full streaming reference.

temperature `number` default: `1.0`

Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default. Override only when you have a specific reason.

max_tokens `integer` optional deprecated

Maximum number of tokens to generate. Must be at least 1 if provided. Deprecated by OpenAI in favor of max_completion_tokens. Still accepted by Sansa. If both are set, max_completion_tokens takes precedence.

max_completion_tokens `integer` optional

Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided.

Takes precedence over max_tokens when both are set.

tools `array` of `Tool` optional

A list of tools the model may call. Each tool has type: "function" and a function object containing name, description, and parameters (a JSON Schema object).

See Tools for the full tool call lifecycle.

tool_choice `string | object` default: `"auto"`

Controls which tool (if any) the model calls.

"auto" -- model decides whether to call a tool or generate text. Default when tools is present.
"none" -- model will not call any tool. Default when no tools are provided.
"required" -- model must call at least one tool.
{"type": "function", "function": {"name": "my_function"}} -- forces the model to call the named function.

parallel_tool_calls `boolean` default: `true`

Whether to allow the model to make multiple tool calls in a single response. When true, the model can return multiple entries in the tool_calls array.

response_format `object` optional

Forces the model to produce output in a specific format.

{"type": "text"} -- default. Unstructured text output.
{"type": "json_object"} -- model output is guaranteed to be valid JSON.
{"type": "json_schema", "json_schema": {"name": "...", "strict": true, "schema": {...}}} -- structured outputs. The response conforms to the provided JSON Schema.

json_schema is preferred over json_object when the underlying model supports it.

reasoning `object` optional

Reasoning (extended thinking) configuration, following the OpenRouter convention.

Field	Type	Description
`effort`	`string`	One of `"xhigh"`, `"high"`, `"medium"`, `"low"`, `"minimal"`, `"none"`.
`max_tokens`	`integer`	Direct token budget for reasoning.
`exclude`	`boolean`	If `true`, reasoning is used internally but not returned in the response.

If both reasoning and reasoning_effort are provided, reasoning takes precedence.

The router uses this as an input to model selection. The final reasoning effort applied may differ from what you requested.

See Reasoning for full details.

reasoning_effort `string` optional

Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".

Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.

top_p `number` optional passed through

Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.

Forwarded directly to the underlying model. Not used by Sansa's routing logic. Altering both top_p and temperature simultaneously is generally not recommended.

stop `string | array` optional passed through

Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.

Forwarded directly to the underlying model.

frequency_penalty `number` optional passed through

Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.

Forwarded directly to the underlying model.

presence_penalty `number` optional passed through

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.

Forwarded directly to the underlying model.

metadata `object` optional

A set of up to 16 key-value pairs that can be attached to the request. Keys: max 64 characters. Values: max 512 characters.

The key call_name is special: its value labels the request in the Sansa dashboard. call_name must be 64 characters or less and cannot be empty or whitespace-only. There is no top-level call_name parameter -- it lives exclusively inside metadata.

n `integer` default: `1` not supported

Number of completions to generate. Only n=1 is accepted. Requests with n > 1 return a #400 error with code unsupported_parameter.

audio `object` not supported

Audio output configuration. Sansa is text-only. Providing this parameter returns a #400 error with code unsupported_parameter.

modalities `array` not supported

Output modalities. Only ["text"] is accepted. Including "audio" returns a #400 error with code unsupported_parameter.

web_search_options `object` not supported

Web search configuration. Not supported. Returns a #400 error with code unsupported_parameter.

functions `array` deprecated not supported

Legacy function definitions. Replaced by tools. Returns a #400 error with code unsupported_parameter.

function_call `string | object` deprecated not supported

Legacy function call control. Replaced by tool_choice. Returns a #400 error with code unsupported_parameter.