Completions

API reference for the chat completions endpoint

Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa auto-routes requests to the best underlying model based on conversation content, tools, and reasoning configuration.

POST /v1/chat/completions

See the code panel for request and response examples.


Body Parameters

messages

Required. An array of messages comprising the conversation so far. Sansa supports text, image, and audio inputs. Messages can include image_url (URL or base64) and input_audio (base64) content parts — these are routed to a multimodal model automatically.

To transcribe a standalone audio file, use the Transcriptions endpoint instead.

interface Message {
  // One of "system", "user", "assistant", "tool".
  // The legacy "developer" role is also accepted and treated as "system".
  role: "system" | "user" | "assistant" | "tool";

  // The text content of the message.
  // Can be a plain string or an array of content parts.
  // Set to null on assistant messages that contain tool_calls.
  content: string | ContentPart[] | null;

  // An array of tool calls the model wants to make.
  // Present when the model decides to call tools instead of generating text.
  tool_calls?: ToolCall[];

  // Must match an id from a prior tool_calls entry.
  // Required for tool messages.
  tool_call_id?: string;
  
  // The model's reasoning text (assistant messages only).
  reasoning?: string;

  // Structured reasoning blocks (assistant messages only).
  // When echoing back assistant messages, pass these unmodified
  // to maintain reasoning continuity. See the Reasoning docs.
  reasoning_details?: ReasoningDetail[];
}

interface ContentPart {
  type: "text" | "image_url" | "input_audio";

  // For type "text".
  text?: string;

  // For type "image_url". Accepts a public URL or a base64 data URI.
  // data URI format: "data:<mime>;base64,<data>"
  image_url?: {
    url: string;
  };

  // For type "input_audio". Base64-encoded audio bytes.
  input_audio?: {
    // Base64-encoded audio bytes.
    data: string;
    // Audio format: "mp3", "wav", "m4a", "webm", "ogg", "flac", or "opus".
    format: string;
  };
}

model

Optional. Any value is accepted — "sansa-auto", null, or any existing model name you're already using. Sansa ignores this field and always auto-routes to the best underlying model. The response always returns "sansa-auto" in the model field.

This means you can drop Sansa in by changing only baseURL and apiKey — no need to update every model call in your codebase.

stream

Default: false. If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk.

temperature

Default: 1.0. Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default.

max_completion_tokens

Optional. Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided. Takes precedence over max_tokens.

max_tokens

Optional. Legacy alias for max_completion_tokens. Accepted and used as a fallback when max_completion_tokens is not provided. max_completion_tokens takes precedence if both are set.

tools

Optional. A list of tools the model may call.

interface Tool {
  type: "function";
  function: {
    name: string;
    description?: string;
    // A JSON Schema object
    parameters: object;
  };
}

tool_choice

Default: "auto". Controls which tool (if any) the model calls.

// Model decides whether to call a tool or generate text.
"auto"

// Model will not call any tool.
"none"

// Model must call at least one tool.
"required"

// Forces the model to call the named function.
{ type: "function", function: { name: "my_function" } }

parallel_tool_calls

Default: true. Whether to allow the model to make multiple tool calls in a single response.

response_format

Optional. Forces the model to produce output in a specific format.

// Default. Unstructured text output.
{ type: "text" }

// Model output is guaranteed to be valid JSON.
{ type: "json_object" }

// Structured outputs. The response conforms to the provided JSON Schema.
{
  type: "json_schema",
  json_schema: {
    name: "...",
    strict: true,
    schema: { ... }
  }
}

reasoning

Optional. Reasoning (extended thinking) configuration, following the OpenRouter convention.

{
  // One of "xhigh", "high", "medium", "low", "minimal", "none".
  effort: "medium",

  // Direct token budget for reasoning.
  max_tokens: 1000,

  // If true, reasoning is used internally but not returned in the response.
  exclude: false
}

If both reasoning and reasoning_effort are provided, reasoning takes precedence.

reasoning_effort

Optional. Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".

Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.

top_p

Optional. Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.

stop

Optional. Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.

frequency_penalty

Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.

presence_penalty

Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.

metadata

Optional. A set of up to 16 key-value pairs attached to the request. Keys must be ≤ 64 characters; values must be ≤ 512 characters.

Use metadata["call_name"] to tag a request with a label. Tagged requests can be filtered in your dashboard.

  • call_name must be ≤ 64 characters and cannot be empty or whitespace-only.
  • Invalid call_name values return 400 with code invalid_call_name.
{
  "call_name": "summarize-email"
}

Currently, call_name is the only metadata key that Sansa reads. Other keys are accepted for OpenAI SDK compatibility but are not stored.


Returns

A ChatCompletion object.

interface ChatCompletion {
  // Unique identifier for the completion.
  id: string;

  // Always "chat.completion".
  object: "chat.completion";

  // Unix timestamp of when the completion was created.
  created: number;

  // Always returns "sansa-auto".
  model: "sansa-auto";

  // Completion choices. Always length 1.
  choices: {
    // Always 0.
    index: number;

    message: {
      // Always "assistant".
      role: "assistant";
      
      // Text content. null when tool_calls is present.
      content: string | null;
      
      // Tool calls the model wants to make, if any.
      tool_calls: ToolCall[] | null;
      
      // Reasoning text, if reasoning was enabled and not excluded.
      reasoning: string | null;
      
      // Structured reasoning blocks.
      reasoning_details: ReasoningDetail[] | null;
    };
    
    // Reason why the generation stopped.
    // "stop": Natural completion or stop sequence hit.
    // "length": Hit max_tokens / max_completion_tokens limit.
    // "tool_calls": Model wants to call one or more tools.
    // "content_filter": Content was filtered by the underlying provider.
    // "error": Error during generation.
    finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "error";
  }[];

  // Token usage statistics.
  usage: {
    // Number of tokens in the input.
    prompt_tokens: integer;
    
    // Number of tokens in the output.
    completion_tokens: integer;
    
    // Sum of prompt_tokens and completion_tokens.
    total_tokens: integer;
  };
}

Errors

All errors follow this format:

{
  "error": {
    "code": "error_code",
    "message": "Human-readable description."
  }
}
HTTP StatusCodeWhen
400invalid_call_namemetadata["call_name"] is empty, whitespace-only, or exceeds 64 characters
400unsupported_parametern > 1, audio, web_search_options, functions, or function_call provided
401unauthorizedInvalid or missing API key
402insufficient_creditsAccount balance is zero or negative
429rate_limit_exceededToo many requests
500internal_errorUnexpected server error

Note on 402: The insufficient_credits error uses HTTP 402, which is not part of OpenAI's standard error set. The OpenAI SDK surfaces it as a generic APIStatusError rather than a named error type. Catch it by checking error.status === 402 or error.code === "insufficient_credits".


Unsupported & Ignored Parameters

The following parameters are not supported and will return a 400 error:

  • n > 1 (n=1 or omitting n is fine)
  • audio
  • modalities containing "audio" (omitting modalities or passing ["text"] is fine)
  • web_search_options
  • functions (deprecated, use tools)
  • function_call (deprecated, use tool_choice)

The following parameters are ignored (accepted for compatibility but have no effect):

  • logit_bias
  • logprobs, top_logprobs
  • seed
  • stream_options (usage is always included in the final streaming chunk regardless)
  • prediction
  • store
  • service_tier
  • prompt_cache_key, prompt_cache_retention
  • safety_identifier
  • user
  • verbosity
  • tools[].function.strict (OpenAI structured tool outputs; accepted but silently dropped — the field has no effect)

Billing

  • Credits are checked before each request.
  • Cost is estimated before the request and reserved from your balance.
  • After the response completes, cost is recalculated using actual token usage and the difference is refunded or deducted.
  • The usage field in the response reflects actual token consumption.
  • Pricing is per-million tokens, billed separately for input and output.
  • Failed requests are not charged.