Reasoning

Extended thinking tokens for complex tasks

Overview

Reasoning gives the model extra "thinking" time before answering. Useful for math, logic, multi-step analysis, and code review. Two ways to enable it: the reasoning object or the top-level reasoning_effort parameter. Reasoning tokens count as output tokens.

Two Configuration Formats

Format A: reasoning object (OpenRouter-style)

The reasoning parameter is not part of the OpenAI SDK's native type definitions. Pass it via extra_body in the TypeScript and Python SDKs:

const response = await client.chat.completions.create({
  model: 'sansa-auto',
  messages: [{ role: 'user', content: 'Solve this math problem...' }],
  // @ts-ignore or cast — reasoning is not in OpenAI's type definitions
  ...({ extra_body: { reasoning: { effort: 'high' } } } as object),
} as Parameters<typeof client.chat.completions.create>[0]);

Or more simply, using the SDK's extra_body option directly:

const response = await client.chat.completions.create(
  {
    model: 'sansa-auto',
    messages: [{ role: 'user', content: 'Solve this math problem...' }],
  },
  {
    // @ts-expect-error extra_body is not in the official type
    extra_body: { reasoning: { effort: 'high' } },
  }
);

In Python, pass it as a regular keyword argument — the SDK forwards unknown parameters automatically:

response = client.chat.completions.create(
    model="sansa-auto",
    messages=[{"role": "user", "content": "Solve this math problem..."}],
    extra_body={"reasoning": {"effort": "high"}},
)

Raw JSON (curl / any HTTP client):

{
  "messages": [{"role": "user", "content": "Solve this math problem..."}],
  "reasoning": {
    "effort": "high"
  }
}

Full reasoning object:

interface ReasoningConfig {
  // One of "xhigh", "high", "medium", "low", "minimal", "none".
  effort?: string;

  // Direct token budget (alternative to effort).
  max_tokens?: number;

  // If true, reasoning is used internally but not returned in the response.
  // Tokens are still generated and billed.
  exclude?: boolean;
}

Rules: Use effort OR max_tokens, not both.

Format B: reasoning_effort (OpenAI-style)

{
  "messages": [{"role": "user", "content": "Solve this math problem..."}],
  "reasoning_effort": "high"
}

Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".

Equivalent to reasoning: { effort: "high" }.

Precedence: If both reasoning and reasoning_effort are provided, the reasoning object takes precedence.

Effort Levels

LevelDescriptionUse when
xhighMaximum reasoning depthComplex math, logic puzzles
highDeep reasoningMulti-step analysis, code review
mediumBalanced (default)General-purpose reasoning
lowLight reasoningSimple analysis, quick decisions
minimalVery lightBasic tasks that benefit slightly
noneNo reasoningDisable reasoning entirely

Reasoning in Responses

Non-Streaming

Reasoning appears in choices[0].message.reasoning (string) and/or choices[0].message.reasoning_details (structured array).

TypeScript note: reasoning and reasoning_details are not part of ChatCompletionMessage's type definition in the OpenAI SDK. They are present in the response JSON and preserved as extras by the SDK's Pydantic-compatible handling, but TypeScript will not know about them. Access them via a type assertion:

const msg = response.choices[0].message as any;
console.log('Reasoning:', msg.reasoning);
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The answer is 42.",
      "reasoning": "Let me think step by step...",
      "reasoning_details": [{
        "type": "reasoning.text",
        "text": "Let me think step by step...",
        "id": "reasoning-1",
        "format": "rf-c",
        "index": 0
      }]
    }
  }]
}

Streaming

Reasoning details appear in delta.reasoning_details before content starts:

Note: Line breaks added for readability.

data: {
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "type": "reasoning.text",
        "text": "Step 1..."
      }]
    }
  }]
}

data: {
  "choices": [{
    "delta": { "content": "The answer is " }
  }]
}

data: {
  "choices": [{
    "delta": { "content": "42." }
  }]
}

reasoning_details Types

interface ReasoningDetail {
  // Common fields
  id: string;
  index: number;

  // Opaque format token assigned by Sansa (e.g., "rf-a", "rf-b", "rf-c", "rf-d").
  // Pass it back unchanged — do not parse or interpret these values.
  format: string;

  // Type-specific fields
  type: "reasoning.text" | "reasoning.summary" | "reasoning.encrypted";
  
  // For "reasoning.text"
  text?: string;
  signature?: string;
  
  // For "reasoning.summary"
  summary?: string;
  
  // For "reasoning.encrypted"
  data?: string;
}

Preserving Reasoning Across Turns

When using the OpenAI SDK, reasoning data is preserved automatically in non-streaming responses. Just append the full message object to your messages array — the SDK keeps reasoning_details intact:

const response = await client.chat.completions.create({
  model: "sansa-auto",
  messages: [...],
  extra_body: { reasoning: { effort: "high" } },
});

// Append the full message — reasoning_details is preserved automatically
messages.push(response.choices[0].message);
messages.push({ role: "user", content: "Follow up question..." });

This maintains reasoning continuity across turns and tool calls.

If reasoning is missing, requests still work. If you manually construct messages or strip reasoning fields, Sansa handles it gracefully. Reasoning continuity improves quality but is never required.

Streaming limitation

The OpenAI SDK does not currently auto-accumulate reasoning_details from streaming deltas. As a result, reasoning continuity is not available for streaming responses. This is a current limitation of SDK compatibility — non-streaming requests are recommended when reasoning continuity matters (e.g., multi-turn tool calling flows).

Example: tool calling with reasoning

{
  "messages": [
    {"role": "user", "content": "What's the weather?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{"id": "callwx001", "function": {"name": "get_weather", "arguments": "{}"}}],
      "reasoning_details": [/* pass back unmodified from previous response */]
    },
    {
      "role": "tool",
      "tool_call_id": "callwx001",
      "content": "{\"temp\": 72}"
    }
  ]
}

Billing

  • Reasoning tokens are counted as output tokens.
  • Included in usage.completion_tokens.
  • Higher reasoning effort = more output tokens = higher cost.
  • Use exclude: true if you want reasoning benefits without tokens in the response (tokens are still generated and billed, just not returned).