claude-sonnet-4-6

Claude Sonnet 4.6 — Anthropic's balanced everyday model. OpenAI-compatible /v1/chat/completions (or native /v1/messages) on api.reapi.ai with 1M context, 128K max output, vision input, and fast production latency.

Claude Sonnet 4.6 is Anthropic's balanced everyday chat model, exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions endpoint (native Anthropic /v1/messages also available). 1M token context, 128K max output, vision input, tool use, and fast production latency. Current rates live on the model page and on api.reapi.ai/pricing.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 0.7
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    temperature=0.7,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai",
)

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  temperature: 0.7,
  // @ts-expect-error — `group` is an api.reapi.ai-specific extension
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "claude-sonnet-4-6",
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":      true,
        "max_tokens":  4096,
        "temperature": 0.7,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The Claude Sonnet 4.6 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

Open api.reapi.ai and sign in (or create an account).
Generate an API key under API Keys.
Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).

Authorization: Bearer YOUR_API_KEY

The chat surface (api.reapi.ai) is a separate workspace from the image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances do not cross over — a key issued on reapi.ai/settings/apikeys will not authenticate against api.reapi.ai/v1/chat/completions, and vice versa.

Endpoints

POST https://api.reapi.ai/v1/chat/completions   # OpenAI-compatible
POST https://api.reapi.ai/v1/messages           # Anthropic-native

Both surfaces accept claude-sonnet-4-6. Pick whichever matches your SDK of record:

/v1/chat/completions — drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1.
/v1/messages — native Anthropic Messages format. Set base_url to https://api.reapi.ai for the Anthropic Python / TypeScript SDKs. Use this when you want the full Anthropic tool-use spec or native content blocks.

Request body — `/v1/chat/completions`

`model` — string, required

Must be "claude-sonnet-4-6". The value is echoed back in the response envelope.

`messages` — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url parts)"
}

`max_tokens` — integer, default `4096`

Upper bound on output tokens. Anthropic's API requires max_tokens on every call, including streamed ones — even though the OpenAI SDKs treat it as optional. Set it generously (128000 is the hard cap); the model still stops at the natural end of its response.

`stream` — boolean, default `false`

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

`temperature` — number, default `1`

Range 0.0 – 1.0. Sampling temperature. Anthropic recommends tuning either temperature or top_p, not both.

`top_p` — number, default `1`

Range 0.0 – 1.0. Nucleus sampling cutoff.

`tools` / `tool_choice` — optional

Standard OpenAI tool-calling parameters. For Anthropic's native tool-use schema, call /v1/messages directly.

`group` — string, default `"default"`

api.reapi.ai-specific extension. Selects a token group on the gateway. Omit if default routing is fine.

Vision input

Send images alongside text via OpenAI content-parts:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Summarise the error in this screenshot." },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/screenshot.png" }
        }
      ]
    }
  ]
}

Supported formats: PNG, JPEG, GIF, WebP. Base64 URLs work too (data:image/png;base64,...). Each image counts toward the input token budget based on its resolution.

Response shape — `/v1/chat/completions`

Non-streaming (`stream: false`)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1735000000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Streaming (`stream: true`)

Content-Type: text/event-stream. Each data: line is a JSON delta in the OpenAI chunk format; the final event before [DONE] carries the finish_reason (stop / length / tool_calls / content_filter).

Pricing

Claude Sonnet 4.6 is billed pay-as-you-go in USD against your api.reapi.ai token balance. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

input_cost  = prompt_tokens     × input_rate  / 1,000,000
output_cost = completion_tokens × output_rate / 1,000,000

Failed requests are not charged.

Limits

Limit	Value
Context window	1M tokens
Max output per call	128K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.

Errors

The error envelope follows the OpenAI shape:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Status	When	Notes
`400`	Missing `max_tokens`, bad shape, etc.	Anthropic requires `max_tokens`; OpenAI SDKs that omit it will 400 here
`401`	Missing / invalid API key	Re-issue a key at api.reapi.ai
`402`	Insufficient balance	Top up at api.reapi.ai
`429`	Per-group rate limit hit	Back off, or move to a different `group`
`500`	Upstream / gateway error	Safe to retry — failed calls are not charged

api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.

Recipes

Minimum request

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling)

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Vision

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
        {
          "type": "image_url",
          "image_url": { "url": "https://your-cdn.com/screenshot.png" }
        }
      ]
    }
  ]
}

When to pick Claude Sonnet 4.6 over Claude Opus 4.7

Both share the same endpoint, the same context window, and the same OpenAI-compatible wire format — switching is a one-line change in the model field. Pick Claude Sonnet 4.6 when:

Production chat traffic where time-to-first-token shows up in user-experience metrics.
Code review and PR triage that runs across high volume.
Mid-complexity agents where Claude-grade reasoning is enough and Opus-tier reasoning would be overkill.
Default routing — use Sonnet as the everyday model and escalate to Opus for the genuinely hard turns.

Pick Claude Opus 4.7 for high-stakes coding, large refactors, complex multi-step agents, and long-context analysis where output quality dominates the decision.

Tips

Set max_tokens generously. Anthropic enforces it strictly; the model still stops at the natural end of its response, but a low cap will truncate before the real ending.
Stream by default for chat UX. Sonnet's lower time-to-first-token makes the perceived latency advantage especially visible in streamed responses.
Tune temperature or top_p, not both. Mixing them produces results that are hard to reason about.
Use Sonnet as the default and escalate to Opus. The cleanest production pattern: route everything to Sonnet, switch to Opus on the calls where quality matters most.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 0.7
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    temperature=0.7,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai",
)

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  temperature: 0.7,
  // @ts-expect-error — `group` is an api.reapi.ai-specific extension
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "claude-sonnet-4-6",
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":      true,
        "max_tokens":  4096,
        "temperature": 0.7,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The Claude Sonnet 4.6 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

Open api.reapi.ai and sign in (or create an account).
Generate an API key under API Keys.
Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).

Authorization: Bearer YOUR_API_KEY

Endpoints

POST https://api.reapi.ai/v1/chat/completions   # OpenAI-compatible
POST https://api.reapi.ai/v1/messages           # Anthropic-native

Both surfaces accept claude-sonnet-4-6. Pick whichever matches your SDK of record:

/v1/chat/completions — drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1.
/v1/messages — native Anthropic Messages format. Set base_url to https://api.reapi.ai for the Anthropic Python / TypeScript SDKs. Use this when you want the full Anthropic tool-use spec or native content blocks.

Request body — `/v1/chat/completions`

`model` — string, required

Must be "claude-sonnet-4-6". The value is echoed back in the response envelope.

`messages` — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url parts)"
}

`max_tokens` — integer, default `4096`

`stream` — boolean, default `false`

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

`temperature` — number, default `1`

Range 0.0 – 1.0. Sampling temperature. Anthropic recommends tuning either temperature or top_p, not both.

`top_p` — number, default `1`

Range 0.0 – 1.0. Nucleus sampling cutoff.

`tools` / `tool_choice` — optional

Standard OpenAI tool-calling parameters. For Anthropic's native tool-use schema, call /v1/messages directly.

`group` — string, default `"default"`

api.reapi.ai-specific extension. Selects a token group on the gateway. Omit if default routing is fine.

Vision input

Send images alongside text via OpenAI content-parts:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Summarise the error in this screenshot." },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/screenshot.png" }
        }
      ]
    }
  ]
}

Supported formats: PNG, JPEG, GIF, WebP. Base64 URLs work too (data:image/png;base64,...). Each image counts toward the input token budget based on its resolution.

Response shape — `/v1/chat/completions`

Non-streaming (`stream: false`)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1735000000,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Streaming (`stream: true`)

Pricing

Claude Sonnet 4.6 is billed pay-as-you-go in USD against your api.reapi.ai token balance. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

input_cost  = prompt_tokens     × input_rate  / 1,000,000
output_cost = completion_tokens × output_rate / 1,000,000

Failed requests are not charged.

Limits

Limit	Value
Context window	1M tokens
Max output per call	128K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.

Errors

The error envelope follows the OpenAI shape:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Status	When	Notes
`400`	Missing `max_tokens`, bad shape, etc.	Anthropic requires `max_tokens`; OpenAI SDKs that omit it will 400 here
`401`	Missing / invalid API key	Re-issue a key at api.reapi.ai
`402`	Insufficient balance	Top up at api.reapi.ai
`429`	Per-group rate limit hit	Back off, or move to a different `group`
`500`	Upstream / gateway error	Safe to retry — failed calls are not charged

Recipes

Minimum request

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling)

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Vision

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
        {
          "type": "image_url",
          "image_url": { "url": "https://your-cdn.com/screenshot.png" }
        }
      ]
    }
  ]
}

When to pick Claude Sonnet 4.6 over Claude Opus 4.7

Both share the same endpoint, the same context window, and the same OpenAI-compatible wire format — switching is a one-line change in the model field. Pick Claude Sonnet 4.6 when:

Production chat traffic where time-to-first-token shows up in user-experience metrics.
Code review and PR triage that runs across high volume.
Mid-complexity agents where Claude-grade reasoning is enough and Opus-tier reasoning would be overkill.
Default routing — use Sonnet as the everyday model and escalate to Opus for the genuinely hard turns.

Pick Claude Opus 4.7 for high-stakes coding, large refactors, complex multi-step agents, and long-context analysis where output quality dominates the decision.

Tips

Set max_tokens generously. Anthropic enforces it strictly; the model still stops at the natural end of its response, but a low cap will truncate before the real ending.
Stream by default for chat UX. Sonnet's lower time-to-first-token makes the perceived latency advantage especially visible in streamed responses.
Tune temperature or top_p, not both. Mixing them produces results that are hard to reason about.
Use Sonnet as the default and escalate to Opus. The cleanest production pattern: route everything to Sonnet, switch to Opus on the calls where quality matters most.

claude-sonnet-4-6

Table of Contents

claude-sonnet-4-6

Table of Contents