Completions
API reference for the chat completions endpoint
Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa auto-routes requests to the best underlying model based on conversation content, tools, and reasoning configuration.
POST /v1/chat/completions
See the code panel for request and response examples.
Body Parameters
messages
Required. An array of messages comprising the conversation so far. Sansa supports text, image, and audio inputs. Messages can include image_url (URL or base64) and input_audio (base64) content parts — these are routed to a multimodal model automatically.
To transcribe a standalone audio file, use the Transcriptions endpoint instead.
interface Message {
// One of "system", "user", "assistant", "tool".
// The legacy "developer" role is also accepted and treated as "system".
role: "system" | "user" | "assistant" | "tool";
// The text content of the message.
// Can be a plain string or an array of content parts.
// Set to null on assistant messages that contain tool_calls.
content: string | ContentPart[] | null;
// An array of tool calls the model wants to make.
// Present when the model decides to call tools instead of generating text.
tool_calls?: ToolCall[];
// Must match an id from a prior tool_calls entry.
// Required for tool messages.
tool_call_id?: string;
// The model's reasoning text (assistant messages only).
reasoning?: string;
// Structured reasoning blocks (assistant messages only).
// When echoing back assistant messages, pass these unmodified
// to maintain reasoning continuity. See the Reasoning docs.
reasoning_details?: ReasoningDetail[];
}
interface ContentPart {
type: "text" | "image_url" | "input_audio";
// For type "text".
text?: string;
// For type "image_url". Accepts a public URL or a base64 data URI.
// data URI format: "data:<mime>;base64,<data>"
image_url?: {
url: string;
};
// For type "input_audio". Base64-encoded audio bytes.
input_audio?: {
// Base64-encoded audio bytes.
data: string;
// Audio format: "mp3", "wav", "m4a", "webm", "ogg", "flac", or "opus".
format: string;
};
}model
Optional. Any value is accepted — "sansa-auto", null, or any existing model name you're already using. Sansa ignores this field and always auto-routes to the best underlying model. The response always returns "sansa-auto" in the model field.
This means you can drop Sansa in by changing only baseURL and apiKey — no need to update every model call in your codebase.
stream
Default: false. If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk.
temperature
Default: 1.0. Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default.
max_completion_tokens
Optional. Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided. Takes precedence over max_tokens.
max_tokens
Optional. Legacy alias for max_completion_tokens. Accepted and used as a fallback when max_completion_tokens is not provided. max_completion_tokens takes precedence if both are set.
tools
Optional. A list of tools the model may call.
interface Tool {
type: "function";
function: {
name: string;
description?: string;
// A JSON Schema object
parameters: object;
};
}tool_choice
Default: "auto". Controls which tool (if any) the model calls.
// Model decides whether to call a tool or generate text.
"auto"
// Model will not call any tool.
"none"
// Model must call at least one tool.
"required"
// Forces the model to call the named function.
{ type: "function", function: { name: "my_function" } }parallel_tool_calls
Default: true. Whether to allow the model to make multiple tool calls in a single response.
response_format
Optional. Forces the model to produce output in a specific format.
// Default. Unstructured text output.
{ type: "text" }
// Model output is guaranteed to be valid JSON.
{ type: "json_object" }
// Structured outputs. The response conforms to the provided JSON Schema.
{
type: "json_schema",
json_schema: {
name: "...",
strict: true,
schema: { ... }
}
}reasoning
Optional. Reasoning (extended thinking) configuration, following the OpenRouter convention.
{
// One of "xhigh", "high", "medium", "low", "minimal", "none".
effort: "medium",
// Direct token budget for reasoning.
max_tokens: 1000,
// If true, reasoning is used internally but not returned in the response.
exclude: false
}If both reasoning and reasoning_effort are provided, reasoning takes precedence.
reasoning_effort
Optional. Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".
Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.
top_p
Optional. Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.
stop
Optional. Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.
frequency_penalty
Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.
presence_penalty
Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.
metadata
Optional. A set of up to 16 key-value pairs attached to the request. Keys must be ≤ 64 characters; values must be ≤ 512 characters.
Use metadata["call_name"] to tag a request with a label. Tagged requests can be filtered in your dashboard.
call_namemust be≤ 64characters and cannot be empty or whitespace-only.- Invalid
call_namevalues return400with codeinvalid_call_name.
{
"call_name": "summarize-email"
}Currently, call_name is the only metadata key that Sansa reads. Other keys are accepted for OpenAI SDK compatibility but are not stored.
Returns
A ChatCompletion object.
interface ChatCompletion {
// Unique identifier for the completion.
id: string;
// Always "chat.completion".
object: "chat.completion";
// Unix timestamp of when the completion was created.
created: number;
// Always returns "sansa-auto".
model: "sansa-auto";
// Completion choices. Always length 1.
choices: {
// Always 0.
index: number;
message: {
// Always "assistant".
role: "assistant";
// Text content. null when tool_calls is present.
content: string | null;
// Tool calls the model wants to make, if any.
tool_calls: ToolCall[] | null;
// Reasoning text, if reasoning was enabled and not excluded.
reasoning: string | null;
// Structured reasoning blocks.
reasoning_details: ReasoningDetail[] | null;
};
// Reason why the generation stopped.
// "stop": Natural completion or stop sequence hit.
// "length": Hit max_tokens / max_completion_tokens limit.
// "tool_calls": Model wants to call one or more tools.
// "content_filter": Content was filtered by the underlying provider.
// "error": Error during generation.
finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "error";
}[];
// Token usage statistics.
usage: {
// Number of tokens in the input.
prompt_tokens: integer;
// Number of tokens in the output.
completion_tokens: integer;
// Sum of prompt_tokens and completion_tokens.
total_tokens: integer;
};
}Errors
All errors follow this format:
{
"error": {
"code": "error_code",
"message": "Human-readable description."
}
}| HTTP Status | Code | When |
|---|---|---|
400 | invalid_call_name | metadata["call_name"] is empty, whitespace-only, or exceeds 64 characters |
400 | unsupported_parameter | n > 1, audio, web_search_options, functions, or function_call provided |
401 | unauthorized | Invalid or missing API key |
402 | insufficient_credits | Account balance is zero or negative |
429 | rate_limit_exceeded | Too many requests |
500 | internal_error | Unexpected server error |
Note on 402: The insufficient_credits error uses HTTP 402, which is not part of OpenAI's standard error set. The OpenAI SDK surfaces it as a generic APIStatusError rather than a named error type. Catch it by checking error.status === 402 or error.code === "insufficient_credits".
Unsupported & Ignored Parameters
The following parameters are not supported and will return a 400 error:
n > 1(n=1or omittingnis fine)audiomodalitiescontaining"audio"(omittingmodalitiesor passing["text"]is fine)web_search_optionsfunctions(deprecated, usetools)function_call(deprecated, usetool_choice)
The following parameters are ignored (accepted for compatibility but have no effect):
logit_biaslogprobs,top_logprobsseedstream_options(usage is always included in the final streaming chunk regardless)predictionstoreservice_tierprompt_cache_key,prompt_cache_retentionsafety_identifieruserverbositytools[].function.strict(OpenAI structured tool outputs; accepted but silently dropped — the field has no effect)
Billing
- Credits are checked before each request.
- Cost is estimated before the request and reserved from your balance.
- After the response completes, cost is recalculated using actual token usage and the difference is refunded or deducted.
- The
usagefield in the response reflects actual token consumption. - Pricing is per-million tokens, billed separately for input and output.
- Failed requests are not charged.