claude-sonnet-4-6
Claude Sonnet 4.6 — Anthropic's balanced everyday model. OpenAI-compatible /v1/chat/completions (or native /v1/messages) on api.reapi.ai with 1M context, 128K max output, vision input, and fast production latency.
Claude Sonnet 4.6 is Anthropic's balanced everyday chat model, exposed
through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions
endpoint (native Anthropic /v1/messages also available). 1M token
context, 128K max output, vision input, tool use, and fast production
latency. Current rates live on the
model page and on
api.reapi.ai/pricing.
Quick example
curl https://api.reapi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"group": "default",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": true,
"max_tokens": 4096,
"temperature": 0.7
}'from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai/v1",
)
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
max_tokens=4096,
temperature=0.7,
extra_body={"group": "default"},
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)from anthropic import Anthropic
client = Anthropic(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai",
)
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[{"role": "user", "content": "Hello"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.reapi.ai/v1",
});
const stream = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Hello" }],
stream: true,
max_tokens: 4096,
temperature: 0.7,
// @ts-expect-error — `group` is an api.reapi.ai-specific extension
group: "default",
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "claude-sonnet-4-6",
"group": "default",
"messages": []map[string]string{
{"role": "user", "content": "Hello"},
},
"stream": true,
"max_tokens": 4096,
"temperature": 0.7,
})
req, _ := http.NewRequest("POST",
"https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Authentication
Every request needs a Bearer token. The Claude Sonnet 4.6 chat workspace
lives on the api.reapi.ai platform — sign in there to create a key and
top up tokens.
- Open api.reapi.ai and sign in (or create an account).
- Generate an API key under API Keys.
- Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEYThe chat surface (api.reapi.ai) is a separate workspace from the
image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances
do not cross over — a key issued on reapi.ai/settings/apikeys will not
authenticate against api.reapi.ai/v1/chat/completions, and vice versa.
Endpoints
POST https://api.reapi.ai/v1/chat/completions # OpenAI-compatible
POST https://api.reapi.ai/v1/messages # Anthropic-nativeBoth surfaces accept claude-sonnet-4-6. Pick whichever matches your SDK
of record:
/v1/chat/completions— drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Setbase_urltohttps://api.reapi.ai/v1./v1/messages— native Anthropic Messages format. Setbase_urltohttps://api.reapi.aifor the Anthropic Python / TypeScript SDKs. Use this when you want the full Anthropic tool-use spec or native content blocks.
Request body — /v1/chat/completions
model — string, required
Must be "claude-sonnet-4-6". The value is echoed back in the response
envelope.
messages — array, required
Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:
{
"role": "system" | "user" | "assistant" | "tool",
"content": "string OR content-parts array (text + image_url parts)"
}max_tokens — integer, default 4096
Upper bound on output tokens. Anthropic's API requires max_tokens
on every call, including streamed ones — even though the OpenAI SDKs
treat it as optional. Set it generously (128000 is the hard cap); the
model still stops at the natural end of its response.
stream — boolean, default false
When true, the response is streamed as server-sent events (SSE) with
Content-Type: text/event-stream. Each event is a JSON delta in the
OpenAI format, terminated by a data: [DONE] line.
temperature — number, default 1
Range 0.0 – 1.0. Sampling temperature. Anthropic recommends tuning
either temperature or top_p, not both.
top_p — number, default 1
Range 0.0 – 1.0. Nucleus sampling cutoff.
tools / tool_choice — optional
Standard OpenAI tool-calling parameters. For Anthropic's native tool-use
schema, call /v1/messages directly.
group — string, default "default"
api.reapi.ai-specific extension. Selects a token group on the gateway. Omit if default routing is fine.
Vision input
Send images alongside text via OpenAI content-parts:
{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Summarise the error in this screenshot." },
{
"type": "image_url",
"image_url": { "url": "https://example.com/screenshot.png" }
}
]
}
]
}Supported formats: PNG, JPEG, GIF, WebP. Base64 URLs work too
(data:image/png;base64,...). Each image counts toward the input token
budget based on its resolution.
Response shape — /v1/chat/completions
Non-streaming (stream: false)
{
"id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"object": "chat.completion",
"created": 1735000000,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}Streaming (stream: true)
Content-Type: text/event-stream. Each data: line is a JSON delta in
the OpenAI chunk format; the final event before [DONE] carries the
finish_reason (stop / length / tool_calls / content_filter).
Pricing
Claude Sonnet 4.6 is billed pay-as-you-go in USD against your api.reapi.ai token balance. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.
Per-call bill:
input_cost = prompt_tokens × input_rate / 1,000,000
output_cost = completion_tokens × output_rate / 1,000,000Failed requests are not charged.
Limits
| Limit | Value |
|---|---|
| Context window | 1M tokens |
| Max output per call | 128K tokens |
Streams that hit the output cap finish with finish_reason: "length";
call again with a continuation message if you need more text.
Errors
The error envelope follows the OpenAI shape:
{
"error": {
"message": "...",
"type": "invalid_request_error",
"code": "..."
}
}| Status | When | Notes |
|---|---|---|
400 | Missing max_tokens, bad shape, etc. | Anthropic requires max_tokens; OpenAI SDKs that omit it will 400 here |
401 | Missing / invalid API key | Re-issue a key at api.reapi.ai |
402 | Insufficient balance | Top up at api.reapi.ai |
429 | Per-group rate limit hit | Back off, or move to a different group |
500 | Upstream / gateway error | Safe to retry — failed calls are not charged |
api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.
Recipes
Minimum request
{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "Summarise this in three sentences." }
]
}Tool use (function calling)
{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "What's the weather in Tokyo?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up the current weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}Vision
{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
{
"type": "image_url",
"image_url": { "url": "https://your-cdn.com/screenshot.png" }
}
]
}
]
}When to pick Claude Sonnet 4.6 over Claude Opus 4.7
Both share the same endpoint, the same context window, and the same
OpenAI-compatible wire format — switching is a one-line change in the
model field. Pick Claude Sonnet 4.6 when:
- Production chat traffic where time-to-first-token shows up in user-experience metrics.
- Code review and PR triage that runs across high volume.
- Mid-complexity agents where Claude-grade reasoning is enough and Opus-tier reasoning would be overkill.
- Default routing — use Sonnet as the everyday model and escalate to Opus for the genuinely hard turns.
Pick Claude Opus 4.7 for high-stakes coding, large refactors, complex multi-step agents, and long-context analysis where output quality dominates the decision.
Tips
- Set
max_tokensgenerously. Anthropic enforces it strictly; the model still stops at the natural end of its response, but a low cap will truncate before the real ending. - Stream by default for chat UX. Sonnet's lower time-to-first-token makes the perceived latency advantage especially visible in streamed responses.
- Tune
temperatureortop_p, not both. Mixing them produces results that are hard to reason about. - Use Sonnet as the default and escalate to Opus. The cleanest production pattern: route everything to Sonnet, switch to Opus on the calls where quality matters most.