Reasoning
Extended thinking tokens for complex tasks
Overview
Reasoning gives the model extra "thinking" time before answering. Useful for math, logic, multi-step analysis, and code review. Two ways to enable it: the reasoning object or the top-level reasoning_effort parameter. Reasoning tokens count as output tokens.
Two Configuration Formats
Format A: reasoning object (OpenRouter-style)
The reasoning parameter is not part of the OpenAI SDK's native type definitions. Pass it via extra_body in the TypeScript and Python SDKs:
const response = await client.chat.completions.create({
model: 'sansa-auto',
messages: [{ role: 'user', content: 'Solve this math problem...' }],
// @ts-ignore or cast — reasoning is not in OpenAI's type definitions
...({ extra_body: { reasoning: { effort: 'high' } } } as object),
} as Parameters<typeof client.chat.completions.create>[0]);Or more simply, using the SDK's extra_body option directly:
const response = await client.chat.completions.create(
{
model: 'sansa-auto',
messages: [{ role: 'user', content: 'Solve this math problem...' }],
},
{
// @ts-expect-error extra_body is not in the official type
extra_body: { reasoning: { effort: 'high' } },
}
);In Python, pass it as a regular keyword argument — the SDK forwards unknown parameters automatically:
response = client.chat.completions.create(
model="sansa-auto",
messages=[{"role": "user", "content": "Solve this math problem..."}],
extra_body={"reasoning": {"effort": "high"}},
)Raw JSON (curl / any HTTP client):
{
"messages": [{"role": "user", "content": "Solve this math problem..."}],
"reasoning": {
"effort": "high"
}
}Full reasoning object:
interface ReasoningConfig {
// One of "xhigh", "high", "medium", "low", "minimal", "none".
effort?: string;
// Direct token budget (alternative to effort).
max_tokens?: number;
// If true, reasoning is used internally but not returned in the response.
// Tokens are still generated and billed.
exclude?: boolean;
}Rules: Use effort OR max_tokens, not both.
Format B: reasoning_effort (OpenAI-style)
{
"messages": [{"role": "user", "content": "Solve this math problem..."}],
"reasoning_effort": "high"
}Accepted values: "xhigh", "high", "medium", "low", "minimal", "none".
Equivalent to reasoning: { effort: "high" }.
Precedence: If both reasoning and reasoning_effort are provided, the reasoning object takes precedence.
Effort Levels
| Level | Description | Use when |
|---|---|---|
xhigh | Maximum reasoning depth | Complex math, logic puzzles |
high | Deep reasoning | Multi-step analysis, code review |
medium | Balanced (default) | General-purpose reasoning |
low | Light reasoning | Simple analysis, quick decisions |
minimal | Very light | Basic tasks that benefit slightly |
none | No reasoning | Disable reasoning entirely |
Reasoning in Responses
Non-Streaming
Reasoning appears in choices[0].message.reasoning (string) and/or choices[0].message.reasoning_details (structured array).
TypeScript note: reasoning and reasoning_details are not part of ChatCompletionMessage's type definition in the OpenAI SDK. They are present in the response JSON and preserved as extras by the SDK's Pydantic-compatible handling, but TypeScript will not know about them. Access them via a type assertion:
const msg = response.choices[0].message as any;
console.log('Reasoning:', msg.reasoning);{
"choices": [{
"message": {
"role": "assistant",
"content": "The answer is 42.",
"reasoning": "Let me think step by step...",
"reasoning_details": [{
"type": "reasoning.text",
"text": "Let me think step by step...",
"id": "reasoning-1",
"format": "rf-c",
"index": 0
}]
}
}]
}Streaming
Reasoning details appear in delta.reasoning_details before content starts:
Note: Line breaks added for readability.
data: {
"choices": [{
"delta": {
"reasoning_details": [{
"type": "reasoning.text",
"text": "Step 1..."
}]
}
}]
}
data: {
"choices": [{
"delta": { "content": "The answer is " }
}]
}
data: {
"choices": [{
"delta": { "content": "42." }
}]
}reasoning_details Types
interface ReasoningDetail {
// Common fields
id: string;
index: number;
// Opaque format token assigned by Sansa (e.g., "rf-a", "rf-b", "rf-c", "rf-d").
// Pass it back unchanged — do not parse or interpret these values.
format: string;
// Type-specific fields
type: "reasoning.text" | "reasoning.summary" | "reasoning.encrypted";
// For "reasoning.text"
text?: string;
signature?: string;
// For "reasoning.summary"
summary?: string;
// For "reasoning.encrypted"
data?: string;
}Preserving Reasoning Across Turns
When using the OpenAI SDK, reasoning data is preserved automatically in non-streaming responses. Just append the full message object to your messages array — the SDK keeps reasoning_details intact:
const response = await client.chat.completions.create({
model: "sansa-auto",
messages: [...],
extra_body: { reasoning: { effort: "high" } },
});
// Append the full message — reasoning_details is preserved automatically
messages.push(response.choices[0].message);
messages.push({ role: "user", content: "Follow up question..." });This maintains reasoning continuity across turns and tool calls.
If reasoning is missing, requests still work. If you manually construct messages or strip reasoning fields, Sansa handles it gracefully. Reasoning continuity improves quality but is never required.
Streaming limitation
The OpenAI SDK does not currently auto-accumulate reasoning_details from streaming deltas. As a result, reasoning continuity is not available for streaming responses. This is a current limitation of SDK compatibility — non-streaming requests are recommended when reasoning continuity matters (e.g., multi-turn tool calling flows).
Example: tool calling with reasoning
{
"messages": [
{"role": "user", "content": "What's the weather?"},
{
"role": "assistant",
"content": null,
"tool_calls": [{"id": "callwx001", "function": {"name": "get_weather", "arguments": "{}"}}],
"reasoning_details": [/* pass back unmodified from previous response */]
},
{
"role": "tool",
"tool_call_id": "callwx001",
"content": "{\"temp\": 72}"
}
]
}Billing
- Reasoning tokens are counted as output tokens.
- Included in
usage.completion_tokens. - Higher reasoning effort = more output tokens = higher cost.
- Use
exclude: trueif you want reasoning benefits without tokens in the response (tokens are still generated and billed, just not returned).