Streaming

Server-sent events streaming for chat completions

Overview

  • Set stream: true in the request
  • Response is Server-Sent Events (SSE)
  • Compatible with OpenAI SDK streaming, Vercel AI SDK, etc.
  • Usage data included in the final chunk automatically

Quick Example

See the code panel for streaming examples in Python, TypeScript, curl, and raw fetch with SSE parsing.

SSE Format

Each event is a JSON object followed by a blank line:

data: {"id":"...","object":"chat.completion.chunk","created":...,"model":"sansa-auto","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

Stream ends with:

data: [DONE]

The final chunk before [DONE] includes usage:

{
  "id": "...",
  "model": "sansa-auto",
  "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 84,
    "total_tokens": 96
  }
}

Streaming Chunk Shape

FieldDescription
delta.rolePresent in first chunk only
delta.contentIncremental text content
delta.tool_callsIncremental tool call data
delta.reasoning_detailsReasoning blocks (when enabled)
finish_reason"stop", "length", "tool_calls", or "error"
usagePresent in final chunk only

Sansa always includes usage in the final streaming chunk. The stream_options parameter is accepted for compatibility but ignored.

Error Handling

Pre-Stream Errors

If validation fails before streaming starts (bad API key, invalid params, insufficient credits), a standard JSON error response is returned:

{
  "error": {
    "code": "insufficient_credits",
    "message": "Insufficient credits. Please add credits to continue."
  }
}

Mid-Stream Errors

If an error occurs after streaming has begun (HTTP 200 already sent), the error is sent as an SSE event:

data: {"error":{"code":"provider_error","message":"Provider disconnected"},"choices":[{"finish_reason":"error"}]}

data: [DONE]

Mid-stream errors always have finish_reason: "error" and are followed by [DONE].

Streaming with Tool Calls

When the model makes a tool call during streaming:

  • delta.tool_calls appears incrementally
  • id and function.name arrive in the first tool call chunk
  • function.arguments streams as partial JSON fragments (must be accumulated)
  • finish_reason is "tool_calls" in the final choice chunk

Streaming with Reasoning

When reasoning tokens are enabled:

  • delta.reasoning_details appears before delta.content
  • Reasoning blocks stream incrementally
  • Content starts streaming after reasoning completes

Streaming with Structured Outputs

Structured outputs (response_format: { type: "json_schema" }) work with streaming. The model streams valid partial JSON that forms a complete valid response when done.