claude-opus-4-8
Claude Opus 4.8 — Anthropic's most capable model for complex reasoning and agentic coding. OpenAI-compatible /v1/chat/completions (or native /v1/messages) on api.reapi.ai with 1M context, 128K max output, vision input, and prompt caching.
Claude Opus 4.8 is Anthropic's most capable model for complex reasoning
and long-horizon agentic coding, exposed through api.reapi.ai as a
drop-in OpenAI-compatible Chat Completions endpoint (the native
Anthropic /v1/messages surface is also available). 1M token context,
128K max output, vision input, prompt caching, and tool use. Current
rates live on the
model page and on
api.reapi.ai/pricing.
Quick example
curl https://api.reapi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-8",
"group": "default",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": true,
"max_tokens": 4096,
"temperature": 0.7
}'from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai/v1",
)
stream = client.chat.completions.create(
model="claude-opus-4-8",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
max_tokens=4096,
temperature=0.7,
extra_body={"group": "default"},
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)from anthropic import Anthropic
client = Anthropic(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai",
)
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=4096,
messages=[{"role": "user", "content": "Hello"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.reapi.ai/v1",
});
const stream = await client.chat.completions.create({
model: "claude-opus-4-8",
messages: [{ role: "user", content: "Hello" }],
stream: true,
max_tokens: 4096,
temperature: 0.7,
// `group` is an api.reapi.ai-specific extension; pass via extra body.
// @ts-expect-error — not part of the OpenAI types
group: "default",
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "claude-opus-4-8",
"group": "default",
"messages": []map[string]string{
{"role": "user", "content": "Hello"},
},
"stream": true,
"max_tokens": 4096,
"temperature": 0.7,
})
req, _ := http.NewRequest("POST",
"https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Authentication
Every request needs a Bearer token. The Claude Opus 4.8 chat workspace
lives on the api.reapi.ai platform — sign in there to create a key and
top up tokens.
- Open api.reapi.ai and sign in (or create an account).
- Generate an API key under API Keys.
- Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEYThe chat surface (api.reapi.ai) is a separate workspace from the
image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances
do not cross over — a key issued on reapi.ai/settings/apikeys will not
authenticate against api.reapi.ai/v1/chat/completions, and vice versa.
Endpoints
POST https://api.reapi.ai/v1/chat/completions # OpenAI-compatible
POST https://api.reapi.ai/v1/messages # Anthropic-nativeBoth surfaces accept claude-opus-4-8. Pick whichever matches your SDK
of record:
/v1/chat/completions— drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Setbase_urltohttps://api.reapi.ai/v1./v1/messages— native Anthropic Messages format. Setbase_urltohttps://api.reapi.aifor the Anthropic Python / TypeScript SDKs. Required for callers that need Anthropic-specific features (cache_controlblocks for prompt caching, native multi-block content, the full tool-use spec).
Request body — /v1/chat/completions
model — string, required
Must be "claude-opus-4-8". The value is echoed back in the response
envelope.
messages — array, required
Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:
{
"role": "system" | "user" | "assistant" | "tool",
"content": "string OR content-parts array (text + image_url parts)"
}Multi-turn history is sent in chronological order — the last message is the one Claude responds to.
max_tokens — integer, default 4096
Upper bound on output tokens. Anthropic's API requires max_tokens
on every call, including streamed ones — even though the OpenAI SDKs
treat it as optional. Set it generously (128000 is the hard cap on the
synchronous API) for long-form outputs; the model still stops at the
natural end of its response.
stream — boolean, default false
When true, the response is streamed as server-sent events (SSE) with
Content-Type: text/event-stream. Each event is a JSON delta in the
OpenAI format, terminated by a data: [DONE] line.
temperature — number, default 1
Range 0.0 – 1.0. Sampling temperature. Anthropic recommends
either temperature or top_p, not both. Lower values produce
more deterministic output.
top_p — number, default 1
Range 0.0 – 1.0. Nucleus sampling cutoff.
tools / tool_choice — optional
Standard OpenAI tool-calling parameters. Claude Opus 4.8 supports the
full OpenAI tool-use spec via this surface and uses tools more
efficiently than prior Opus models — fewer steps for the same result.
For Anthropic's native tool-use schema (with cache_control,
tool_choice_type, etc.) call /v1/messages directly.
group — string, default "default"
api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.
Vision input (multimodal)
Send images alongside text via OpenAI content-parts:
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What does this chart show?" },
{
"type": "image_url",
"image_url": { "url": "https://example.com/chart.png" }
}
]
}
]
}Supported image formats: PNG, JPEG, GIF, WebP. Base64 URLs work too —
prefix data:image/png;base64,.... Each image counts toward the input
token budget based on its resolution.
Prompt caching
Anthropic's prompt caching pays off on stable system prompts, recurring RAG context, and long multi-turn agent histories. The first call pays the cache-write rate on the cacheable region; subsequent calls within the cache window pay only the (much lower) cache-read rate on those tokens.
To enable caching, call /v1/messages natively and add a
cache_control block. Example:
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"system": [
{
"type": "text",
"text": "<your long stable system prompt>",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{ "role": "user", "content": "Question for the assistant" }
]
}The cache key is the hash of the cacheable content. See api.reapi.ai/pricing for cache-read and cache-write rates.
Response shape — /v1/chat/completions
Non-streaming (stream: false)
{
"id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"object": "chat.completion",
"created": 1735000000,
"model": "claude-opus-4-8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}usage.prompt_tokens_details.cached_tokens reports how many input
tokens were served from cache — the part billed at the cache-read rate
rather than the standard input rate.
Streaming (stream: true)
Content-Type: text/event-stream. Each data: line is a JSON delta in
the OpenAI chunk format; the final event before [DONE] carries the
finish_reason (stop / length / tool_calls / content_filter).
Pricing
Claude Opus 4.8 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along several dimensions — input tokens, output tokens, cache-read tokens, and per-request web search. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.
Per-call bill:
billable_input = (prompt_tokens - cached_tokens) × input_rate / 1,000,000
cache_read_bill = cached_tokens × cache_read_rate / 1,000,000
output_bill = completion_tokens × output_rate / 1,000,000Cache-write rate applies on the first call that writes a cache block; subsequent hits pay only the cache-read rate. Web search, when used, is billed per request. Failed requests are not charged.
Limits
| Limit | Value |
|---|---|
| Context window | 1M tokens |
| Max output per call | 128K tokens |
Streams that hit the output cap finish with finish_reason: "length";
call again with a continuation message if you need more text.
Errors
The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:
{
"error": {
"message": "...",
"type": "invalid_request_error",
"code": "..."
}
}Common cases:
| Status | When | Notes |
|---|---|---|
400 | Missing max_tokens, bad shape, etc. | Anthropic requires max_tokens; OpenAI SDKs that omit it will 400 here |
401 | Missing / invalid API key | Re-issue a key at api.reapi.ai |
402 | Insufficient balance | Top up at api.reapi.ai |
429 | Per-group rate limit hit | Back off, or move to a different group |
500 | Upstream / gateway error | Safe to retry — failed calls are not charged |
api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.
Recipes
Minimum request
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "Summarise this in three sentences." }
]
}Tool use (function calling, OpenAI surface)
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "What's the weather in Tokyo today?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up the current weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}Vision
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
{
"type": "image_url",
"image_url": { "url": "https://your-cdn.com/screenshot.png" }
}
]
}
]
}Long context with prompt caching (native Anthropic surface)
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"system": [
{
"type": "text",
"text": "<800K-token reference document>",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{ "role": "user", "content": "Find every mention of the constraint X and list them with line numbers." }
]
}When to pick Claude Opus 4.8
Pick Claude Opus 4.8 when output quality and reliability dominate the decision:
- Long-horizon agentic coding — multi-service refactors, codebase-scale migrations, and agent runs that must stay on-task across many steps.
- High-stakes reasoning — work where a confident-but-wrong answer has real downstream cost. Opus 4.8 is more likely to flag uncertainty than to overclaim.
- Large-context analysis — full codebases, long research packs, multi-document review, audit work.
Route lighter traffic (classification, short replies, tight loops) to cheaper Claude or GPT models on the same key.
Tips
- Set
max_tokensgenerously. Anthropic enforces it strictly — the model still stops at the natural end of its response, but a low cap will truncate before the real ending. - Stream by default for chat UX. Streaming cuts perceived latency dramatically.
- Cache the stable parts of long prompts. A 500K-token RAG context on top of a 1KB user question can pay the cache-read rate on every subsequent call instead of the standard input rate — a big saving on multi-turn agents replaying long histories.
- Tune
temperatureortop_p, not both. Mixing them produces results that are hard to reason about. - Use the native
/v1/messagessurface for Anthropic-only features.cache_control, native multi-block content, full tool-use spec — all of those work through/v1/messageswithout needing translation.