Reasoning
Extended thinking tokens for complex tasks
Overview
Reasoning gives the model extra "thinking" time before answering. Useful for math, logic, multi-step analysis, and code review. Two ways to enable it: the reasoning object or the top-level reasoning_effort parameter. Reasoning tokens count as output tokens.
Two Configuration Formats
Format A: reasoning object (OpenRouter-style)
{
"messages": [{"role": "user", "content": "Solve this math problem..."}],
"reasoning": {
"effort": "high"
}
}Full reasoning object:
| Field | Description |
|---|---|
effort | "xhigh" | "high" | "medium" | "low" | "minimal" | "none" |
max_tokens | Direct token budget (alternative to effort) |
exclude | Use reasoning internally but don't return it |
enabled | Inferred from effort/max_tokens |
Rules: Use effort OR max_tokens, not both. exclude: true means the model reasons but doesn't include it in the response.
Format B: reasoning_effort (OpenAI-style)
{
"messages": [{"role": "user", "content": "Solve this math problem..."}],
"reasoning_effort": "high"
}Accepted values: "xhigh", "high", "medium", "low", "minimal", "none"
Equivalent to reasoning: { effort: "high" }.
Precedence: If both reasoning and reasoning_effort are provided, the reasoning object takes precedence.
Effort Levels
| Level | Description | Use when |
|---|---|---|
xhigh | Maximum reasoning depth | Complex math, logic puzzles |
high | Deep reasoning | Multi-step analysis, code review |
medium | Balanced (default) | General-purpose reasoning |
low | Light reasoning | Simple analysis, quick decisions |
minimal | Very light | Basic tasks that benefit slightly |
none | No reasoning | Disable reasoning entirely |
Reasoning in Responses
Non-Streaming
Reasoning appears in choices[0].message.reasoning (string) and/or choices[0].message.reasoning_details (structured array):
{
"choices": [{
"message": {
"role": "assistant",
"content": "The answer is 42.",
"reasoning": "Let me think step by step...",
"reasoning_details": [{
"type": "reasoning.text",
"text": "Let me think step by step...",
"id": "reasoning-1",
"format": "anthropic-claude-v1",
"index": 0
}]
}
}]
}Streaming
Reasoning details appear in delta.reasoning_details before content starts:
data: {"choices":[{"delta":{"reasoning_details":[{"type":"reasoning.text","text":"Step 1..."}]}}]}
data: {"choices":[{"delta":{"content":"The answer is "}}]}
data: {"choices":[{"delta":{"content":"42."}}]}reasoning_details Types
| Type | Fields | Description |
|---|---|---|
reasoning.text | text, signature | Raw reasoning text |
reasoning.summary | summary | High-level summary |
reasoning.encrypted | data | Encrypted/redacted block |
All types share: id, format, index
Preserving Reasoning Blocks (Tool Calling)
When using reasoning + tools, pass reasoning_details back unmodified in assistant messages:
{
"messages": [
{"role": "user", "content": "What's the weather?"},
{
"role": "assistant",
"content": null,
"tool_calls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{}"}}],
"reasoning_details": [/* pass back unmodified from previous response */]
},
{
"role": "tool",
"tool_call_id": "call_1",
"content": "{\"temp\": 72}"
}
]
}This maintains reasoning continuity across tool calls.
Billing
- Reasoning tokens are counted as output tokens
- Included in
usage.completion_tokens - Higher reasoning effort = more output tokens = higher cost
- Use
exclude: trueif you want reasoning benefits without tokens in the response (tokens are still generated and billed, just not returned)