Completions

API reference for the chat completions endpoint

Creates a model response for the given chat conversation. Compatible with any OpenAI SDK. Sansa can auto-route each request to the best underlying model ("sansa-auto") or call a specific model directly — see the Models docs for full behavior.

POST /v1/chat/completions

See the code panel for request and response examples.

Body Parameters

messages

Required. An array of messages comprising the conversation so far. Sansa supports text, image, and audio inputs. Messages can include image_url (URL or base64) and input_audio (base64) content parts — these are routed to a multimodal model automatically.

To transcribe a standalone audio file, use the Transcriptions endpoint instead.

interface Message {
  // One of "system", "user", "assistant", "tool".
  // The legacy "developer" role is also accepted and treated as "system".
  role: "system" | "user" | "assistant" | "tool";

  // The text content of the message.
  // Can be a plain string or an array of content parts.
  // Set to null on assistant messages that contain tool_calls.
  content: string | ContentPart[] | null;

  // An array of tool calls the model wants to make.
  // Present when the model decides to call tools instead of generating text.
  tool_calls?: ToolCall[];

  // Must match an id from a prior tool_calls entry.
  // Required for tool messages.
  tool_call_id?: string;
  
  // The model's reasoning text (assistant messages only).
  reasoning?: string;

  // Structured reasoning blocks (assistant messages only).
  // When echoing back assistant messages, pass these unmodified
  // to maintain reasoning continuity. See the Reasoning docs.
  reasoning_details?: ReasoningDetail[];
}

interface ContentPart {
  type: "text" | "image_url" | "input_audio";

  // For type "text".
  text?: string;

  // For type "image_url". Accepts a public URL or a base64 data URI.
  // data URI format: "data:<mime>;base64,<data>"
  image_url?: {
    url: string;
  };

  // For type "input_audio". Base64-encoded audio bytes.
  input_audio?: {
    // Base64-encoded audio bytes.
    data: string;
    // Audio format: "mp3", "wav", "m4a", "webm", "ogg", "flac", or "opus".
    format: string;
  };
}

model

Optional. Controls how the request is dispatched.

"sansa-auto", null, or omitted — Sansa's router picks the best underlying model for this request. The response's model field returns the model that was actually used.
A model ID from the catalog (e.g. "openai/gpt-5.4", "anthropic/claude-sonnet-4.6") — direct gateway call to that model with automatic provider failover. No routing runs.
An unknown model ID — returns 400 with code invalid_model.

The full list of supported IDs is on the Models page in your dashboard. See the Models docs for behavior details, the sansa response object, and when to use each mode.

stream

Default: false. If true, the response is delivered as Server-Sent Events (SSE). Compatible with OpenAI SDK streaming. Usage data is always included in the final streaming chunk.

temperature

Default: 1.0. Sampling temperature between 0 and 2. Higher values produce more random output; lower values produce more deterministic output. Forwarded to the underlying model. Sansa generally performs best when this is left at the default.

max_completion_tokens

Optional. Upper bound on the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Must be at least 1 if provided. Takes precedence over max_tokens.

max_tokens

Optional. Legacy alias for max_completion_tokens. Accepted and used as a fallback when max_completion_tokens is not provided. max_completion_tokens takes precedence if both are set.

tools

Optional. A list of tools the model may call.

interface Tool {
  type: "function";
  function: {
    name: string;
    description?: string;
    // A JSON Schema object
    parameters: object;
  };
}

tool_choice

Default: "auto". Controls which tool (if any) the model calls.

// Model decides whether to call a tool or generate text.
"auto"

// Model will not call any tool.
"none"

// Model must call at least one tool.
"required"

// Forces the model to call the named function.
{ type: "function", function: { name: "my_function" } }

parallel_tool_calls

Default: true. Whether to allow the model to make multiple tool calls in a single response.

response_format

Optional. Forces the model to produce output in a specific format.

// Default. Unstructured text output.
{ type: "text" }

// Model output is guaranteed to be valid JSON.
{ type: "json_object" }

// Structured outputs. The response conforms to the provided JSON Schema.
{
  type: "json_schema",
  json_schema: {
    name: "...",
    strict: true,
    schema: { ... }
  }
}

reasoning

Optional. Reasoning (extended thinking) configuration, following the OpenRouter convention.

{
  // One of "xhigh", "high", "medium", "low", "minimal", "none".
  effort: "medium",

  // Direct token budget for reasoning.
  max_tokens: 1000,

  // If true, reasoning is used internally but not returned in the response.
  exclude: false
}

If both reasoning and reasoning_effort are provided, reasoning takes precedence.

reasoning_effort

Optional. Top-level OpenAI-style reasoning effort. Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".

Normalized internally to a reasoning object. If both reasoning and reasoning_effort are present, reasoning takes precedence.

top_p

Optional. Nucleus sampling. Value between 0 and 1. The model considers only the tokens comprising the top top_p probability mass.

stop

Optional. Up to 4 sequences where the model will stop generating further tokens. The returned text will not contain the stop sequence.

frequency_penalty

Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on their existing frequency in the text so far, reducing repetition.

presence_penalty

Optional. Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared in the text, encouraging the model to cover new topics.

metadata

Optional. A set of up to 16 key-value pairs attached to the request. Keys must be ≤ 64 characters; values must be ≤ 512 characters.

Use metadata["call_name"] to tag a request with a label. Tagged requests can be filtered in your dashboard.

call_name must be ≤ 64 characters and cannot be empty or whitespace-only.
Invalid call_name values return 400 with code invalid_call_name.

{
  "call_name": "summarize-email"
}

Currently, call_name is the only metadata key that Sansa reads. Other keys are accepted for OpenAI SDK compatibility but are not stored.

Returns

A ChatCompletion object.

interface ChatCompletion {
  // Unique identifier for the completion.
  id: string;

  // Always "chat.completion".
  object: "chat.completion";

  // Unix timestamp of when the completion was created.
  created: number;

  // The model that actually served the request.
  // For sansa-auto requests, this is the routed model.
  // For direct model requests, this is the model you requested.
  model: string;

  // Completion choices. Always length 1.
  choices: {
    // Always 0.
    index: number;

    message: {
      // Always "assistant".
      role: "assistant";
      
      // Text content. null when tool_calls is present.
      content: string | null;
      
      // Tool calls the model wants to make, if any.
      tool_calls: ToolCall[] | null;
      
      // Reasoning text, if reasoning was enabled and not excluded.
      reasoning: string | null;
      
      // Structured reasoning blocks.
      reasoning_details: ReasoningDetail[] | null;
    };
    
    // Reason why the generation stopped.
    // "stop": Natural completion or stop sequence hit.
    // "length": Hit max_tokens / max_completion_tokens limit.
    // "tool_calls": Model wants to call one or more tools.
    // "content_filter": Content was filtered by the underlying provider.
    // "error": Error during generation.
    finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "error";
  }[];

  // Token usage statistics.
  usage: {
    // Number of tokens in the input.
    prompt_tokens: integer;
    
    // Number of tokens in the output.
    completion_tokens: integer;
    
    // Sum of prompt_tokens and completion_tokens.
    total_tokens: integer;
  };

  // Sansa routing metadata. See the Models docs for details.
  sansa: {
    // True for sansa-auto / null model requests.
    // False for direct model requests.
    routed: boolean;

    // The model the router selected. Only set when routed is true.
    routed_model: string | null;

    // Encoder routing latency in milliseconds. Only set when routed is true.
    routing_latency_ms: number | null;
  };
}

Errors

All errors follow this format:

{
  "error": {
    "code": "error_code",
    "message": "Human-readable description."
  }
}

HTTP Status	Code	When
`400`	`invalid_model`	The `model` value isn't `"sansa-auto"`, `null`, or a known model ID
`400`	`invalid_call_name`	`metadata["call_name"]` is empty, whitespace-only, or exceeds 64 characters
`400`	`unsupported_parameter`	`n > 1`, `audio`, `web_search_options`, `functions`, or `function_call` provided
`401`	`unauthorized`	Invalid or missing API key
`402`	`insufficient_credits`	Account balance is zero or negative
`429`	`rate_limit_exceeded`	Too many requests
`500`	`internal_error`	Unexpected server error

Note on 402: The insufficient_credits error uses HTTP 402, which is not part of OpenAI's standard error set. The OpenAI SDK surfaces it as a generic APIStatusError rather than a named error type. Catch it by checking error.status === 402 or error.code === "insufficient_credits".

Unsupported & Ignored Parameters

The following parameters are not supported and will return a 400 error:

n > 1 (n=1 or omitting n is fine)
audio
modalities containing "audio" (omitting modalities or passing ["text"] is fine)
web_search_options
functions (deprecated, use tools)
function_call (deprecated, use tool_choice)

The following parameters are ignored (accepted for compatibility but have no effect):

logit_bias
logprobs, top_logprobs
seed
stream_options (usage is always included in the final streaming chunk regardless)
prediction
store
service_tier
prompt_cache_key, prompt_cache_retention
safety_identifier
user
verbosity
tools[].function.strict (OpenAI structured tool outputs; accepted but silently dropped — the field has no effect)

Billing

Credits are checked before each request.
Cost is estimated before the request and reserved from your balance.
After the response completes, cost is recalculated using actual token usage and the difference is refunded or deducted.
The usage field in the response reflects actual token consumption.
Pricing is per-model, per-million tokens, billed separately for input and output. Auto-routed requests are billed at the rate of the model the router selected. See the Models page for current rates.
Failed requests are not charged.

Basic Request

curl -X POST https://api.sansaml.com/v1/chat/completions \
  -H "Authorization: Bearer $SANSA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sansa-auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1708300000,
  "model": "openai/gpt-5.4-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 8,
    "total_tokens": 36
  },
  "sansa": {
    "routed": true,
    "routed_model": "openai/gpt-5.4-mini",
    "routing_latency_ms": 189
  }
}

With max_completion_tokens

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "Explain quantum computing."}],
  "max_completion_tokens": 256
}

With response_format (Structured Output)

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "List 3 European capitals."}],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "capitals",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "capitals": {
            "type": "array",
            "items": { "type": "string" }
          }
        },
        "required": ["capitals"],
        "additionalProperties": false
      }
    }
  }
}

With call_name

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "Summarize this email."}],
  "metadata": {
    "call_name": "summarize-email"
  }
}

With call_name and max_completion_tokens

{
  "model": "sansa-auto",
  "messages": [{"role": "user", "content": "Summarize this email."}],
  "metadata": {
    "call_name": "summarize-email"
  },
  "max_completion_tokens": 256
}

Image input (URL)

curl -X POST https://api.sansaml.com/v1/chat/completions \
  -H "Authorization: Bearer $SANSA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sansa-auto",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }]
  }'

Image input (base64)

import fs from 'fs';

const imageData = fs.readFileSync('photo.jpg').toString('base64');

const response = await client.chat.completions.create({
  model: 'sansa-auto',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Describe this image.' },
      {
        type: 'image_url',
        image_url: { url: `data:image/jpeg;base64,${imageData}` },
      },
    ],
  }],
});

Audio input

import fs from 'fs';

const audioData = fs.readFileSync('audio.mp3').toString('base64');

const response = await client.chat.completions.create({
  model: 'sansa-auto',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is being said in this audio?' },
      { type: 'input_audio', input_audio: { data: audioData, format: 'mp3' } },
    ],
  }],
});

console.log(response.choices[0].message.content);

Error Response (400)

{
  "error": {
    "code": "unsupported_parameter",
    "message": "Multiple completions (n > 1) are not supported. Sansa returns a single completion per request."
  }
}

Error Response (402)

{
  "error": {
    "code": "insufficient_credits",
    "message": "Insufficient credits. Please add credits to continue."
  }
}