Reasoning

Extended thinking tokens for complex tasks

Overview

Reasoning gives the model extra "thinking" time before answering. Useful for math, logic, multi-step analysis, and code review. Two ways to enable it: the reasoning object or the top-level reasoning_effort parameter. Reasoning tokens count as output tokens.

Two Configuration Formats

Format A: reasoning object (OpenRouter-style)

{
  "messages": [{"role": "user", "content": "Solve this math problem..."}],
  "reasoning": {
    "effort": "high"
  }
}

Full reasoning object:

FieldDescription
effort"xhigh" | "high" | "medium" | "low" | "minimal" | "none"
max_tokensDirect token budget (alternative to effort)
excludeUse reasoning internally but don't return it
enabledInferred from effort/max_tokens

Rules: Use effort OR max_tokens, not both. exclude: true means the model reasons but doesn't include it in the response.

Format B: reasoning_effort (OpenAI-style)

{
  "messages": [{"role": "user", "content": "Solve this math problem..."}],
  "reasoning_effort": "high"
}

Accepted values: "xhigh", "high", "medium", "low", "minimal", "none"

Equivalent to reasoning: { effort: "high" }.

Precedence: If both reasoning and reasoning_effort are provided, the reasoning object takes precedence.

Effort Levels

LevelDescriptionUse when
xhighMaximum reasoning depthComplex math, logic puzzles
highDeep reasoningMulti-step analysis, code review
mediumBalanced (default)General-purpose reasoning
lowLight reasoningSimple analysis, quick decisions
minimalVery lightBasic tasks that benefit slightly
noneNo reasoningDisable reasoning entirely

Reasoning in Responses

Non-Streaming

Reasoning appears in choices[0].message.reasoning (string) and/or choices[0].message.reasoning_details (structured array):

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The answer is 42.",
      "reasoning": "Let me think step by step...",
      "reasoning_details": [{
        "type": "reasoning.text",
        "text": "Let me think step by step...",
        "id": "reasoning-1",
        "format": "anthropic-claude-v1",
        "index": 0
      }]
    }
  }]
}

Streaming

Reasoning details appear in delta.reasoning_details before content starts:

data: {"choices":[{"delta":{"reasoning_details":[{"type":"reasoning.text","text":"Step 1..."}]}}]}
data: {"choices":[{"delta":{"content":"The answer is "}}]}
data: {"choices":[{"delta":{"content":"42."}}]}

reasoning_details Types

TypeFieldsDescription
reasoning.texttext, signatureRaw reasoning text
reasoning.summarysummaryHigh-level summary
reasoning.encrypteddataEncrypted/redacted block

All types share: id, format, index

Preserving Reasoning Blocks (Tool Calling)

When using reasoning + tools, pass reasoning_details back unmodified in assistant messages:

{
  "messages": [
    {"role": "user", "content": "What's the weather?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [{"id": "call_1", "function": {"name": "get_weather", "arguments": "{}"}}],
      "reasoning_details": [/* pass back unmodified from previous response */]
    },
    {
      "role": "tool",
      "tool_call_id": "call_1",
      "content": "{\"temp\": 72}"
    }
  ]
}

This maintains reasoning continuity across tool calls.

Billing

  • Reasoning tokens are counted as output tokens
  • Included in usage.completion_tokens
  • Higher reasoning effort = more output tokens = higher cost
  • Use exclude: true if you want reasoning benefits without tokens in the response (tokens are still generated and billed, just not returned)