rreAPI Docs
rreAPI Docs
HomepageWelcome

Image

wan-2-7-imagegpt-image-2gpt-image-2-officialgemini-2.5-flash-image-previewgemini-3-pro-image-previewgemini-3.1-flash-image-previewdoubao-seedream-5-0-liteimagen-4-0

Audio

Mureka V9 Song APIVocal Remover APIMusic Extractor APIVoice Cleaner APIMultistem Splitter APIVoice Changer API

Video

wan-2-7-videokling-motion-controlpixverse-v6doubao-seedance-2.0doubao-seedance-2.0-officialdoubao-seedance-2.0-betahappyhorse-1.0happyhorse-1.0-officialviduq3grok-imagine-1.0-videoVeo 3.1gemini-omni

Chat

gpt-5.5gpt-5.4claude-opus-4-8claude-opus-4-7claude-sonnet-4-6

Tools

enhance-video-1.0
X (Twitter)

claude-opus-4-8

Claude Opus 4.8 — Anthropic's most capable model for complex reasoning and agentic coding. OpenAI-compatible /v1/chat/completions (or native /v1/messages) on api.reapi.ai with 1M context, 128K max output, vision input, and prompt caching.

Claude Opus 4.8 is Anthropic's most capable model for complex reasoning and long-horizon agentic coding, exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions endpoint (the native Anthropic /v1/messages surface is also available). 1M token context, 128K max output, vision input, prompt caching, and tool use. Current rates live on the model page and on api.reapi.ai/pricing.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 0.7
  }'
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    temperature=0.7,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai",
)

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-opus-4-8",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  temperature: 0.7,
  // `group` is an api.reapi.ai-specific extension; pass via extra body.
  // @ts-expect-error — not part of the OpenAI types
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "claude-opus-4-8",
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":      true,
        "max_tokens":  4096,
        "temperature": 0.7,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The Claude Opus 4.8 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

  1. Open api.reapi.ai and sign in (or create an account).
  2. Generate an API key under API Keys.
  3. Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEY

The chat surface (api.reapi.ai) is a separate workspace from the image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances do not cross over — a key issued on reapi.ai/settings/apikeys will not authenticate against api.reapi.ai/v1/chat/completions, and vice versa.


Endpoints

POST https://api.reapi.ai/v1/chat/completions   # OpenAI-compatible
POST https://api.reapi.ai/v1/messages           # Anthropic-native

Both surfaces accept claude-opus-4-8. Pick whichever matches your SDK of record:

  • /v1/chat/completions — drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1.
  • /v1/messages — native Anthropic Messages format. Set base_url to https://api.reapi.ai for the Anthropic Python / TypeScript SDKs. Required for callers that need Anthropic-specific features (cache_control blocks for prompt caching, native multi-block content, the full tool-use spec).

Request body — /v1/chat/completions

model — string, required

Must be "claude-opus-4-8". The value is echoed back in the response envelope.

messages — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url parts)"
}

Multi-turn history is sent in chronological order — the last message is the one Claude responds to.

max_tokens — integer, default 4096

Upper bound on output tokens. Anthropic's API requires max_tokens on every call, including streamed ones — even though the OpenAI SDKs treat it as optional. Set it generously (128000 is the hard cap on the synchronous API) for long-form outputs; the model still stops at the natural end of its response.

stream — boolean, default false

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

temperature — number, default 1

Range 0.0 – 1.0. Sampling temperature. Anthropic recommends either temperature or top_p, not both. Lower values produce more deterministic output.

top_p — number, default 1

Range 0.0 – 1.0. Nucleus sampling cutoff.

tools / tool_choice — optional

Standard OpenAI tool-calling parameters. Claude Opus 4.8 supports the full OpenAI tool-use spec via this surface and uses tools more efficiently than prior Opus models — fewer steps for the same result. For Anthropic's native tool-use schema (with cache_control, tool_choice_type, etc.) call /v1/messages directly.

group — string, default "default"

api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.


Vision input (multimodal)

Send images alongside text via OpenAI content-parts:

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What does this chart show?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/chart.png" }
        }
      ]
    }
  ]
}

Supported image formats: PNG, JPEG, GIF, WebP. Base64 URLs work too — prefix data:image/png;base64,.... Each image counts toward the input token budget based on its resolution.


Prompt caching

Anthropic's prompt caching pays off on stable system prompts, recurring RAG context, and long multi-turn agent histories. The first call pays the cache-write rate on the cacheable region; subsequent calls within the cache window pay only the (much lower) cache-read rate on those tokens.

To enable caching, call /v1/messages natively and add a cache_control block. Example:

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "system": [
    {
      "type": "text",
      "text": "<your long stable system prompt>",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [
    { "role": "user", "content": "Question for the assistant" }
  ]
}

The cache key is the hash of the cacheable content. See api.reapi.ai/pricing for cache-read and cache-write rates.


Response shape — /v1/chat/completions

Non-streaming (stream: false)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1735000000,
  "model": "claude-opus-4-8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

usage.prompt_tokens_details.cached_tokens reports how many input tokens were served from cache — the part billed at the cache-read rate rather than the standard input rate.

Streaming (stream: true)

Content-Type: text/event-stream. Each data: line is a JSON delta in the OpenAI chunk format; the final event before [DONE] carries the finish_reason (stop / length / tool_calls / content_filter).


Pricing

Claude Opus 4.8 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along several dimensions — input tokens, output tokens, cache-read tokens, and per-request web search. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

billable_input  = (prompt_tokens - cached_tokens) × input_rate      / 1,000,000
cache_read_bill = cached_tokens                   × cache_read_rate  / 1,000,000
output_bill     = completion_tokens               × output_rate      / 1,000,000

Cache-write rate applies on the first call that writes a cache block; subsequent hits pay only the cache-read rate. Web search, when used, is billed per request. Failed requests are not charged.


Limits

LimitValue
Context window1M tokens
Max output per call128K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.


Errors

The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Common cases:

StatusWhenNotes
400Missing max_tokens, bad shape, etc.Anthropic requires max_tokens; OpenAI SDKs that omit it will 400 here
401Missing / invalid API keyRe-issue a key at api.reapi.ai
402Insufficient balanceTop up at api.reapi.ai
429Per-group rate limit hitBack off, or move to a different group
500Upstream / gateway errorSafe to retry — failed calls are not charged

api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.


Recipes

Minimum request

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling, OpenAI surface)

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo today?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Vision

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
        {
          "type": "image_url",
          "image_url": { "url": "https://your-cdn.com/screenshot.png" }
        }
      ]
    }
  ]
}

Long context with prompt caching (native Anthropic surface)

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "system": [
    {
      "type": "text",
      "text": "<800K-token reference document>",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [
    { "role": "user", "content": "Find every mention of the constraint X and list them with line numbers." }
  ]
}

When to pick Claude Opus 4.8

Pick Claude Opus 4.8 when output quality and reliability dominate the decision:

  • Long-horizon agentic coding — multi-service refactors, codebase-scale migrations, and agent runs that must stay on-task across many steps.
  • High-stakes reasoning — work where a confident-but-wrong answer has real downstream cost. Opus 4.8 is more likely to flag uncertainty than to overclaim.
  • Large-context analysis — full codebases, long research packs, multi-document review, audit work.

Route lighter traffic (classification, short replies, tight loops) to cheaper Claude or GPT models on the same key.


Tips

  • Set max_tokens generously. Anthropic enforces it strictly — the model still stops at the natural end of its response, but a low cap will truncate before the real ending.
  • Stream by default for chat UX. Streaming cuts perceived latency dramatically.
  • Cache the stable parts of long prompts. A 500K-token RAG context on top of a 1KB user question can pay the cache-read rate on every subsequent call instead of the standard input rate — a big saving on multi-turn agents replaying long histories.
  • Tune temperature or top_p, not both. Mixing them produces results that are hard to reason about.
  • Use the native /v1/messages surface for Anthropic-only features. cache_control, native multi-block content, full tool-use spec — all of those work through /v1/messages without needing translation.

Related

  • Authentication
  • Quickstart
  • Errors catalog

Table of Contents

Quick example
Authentication
Endpoints
Request body — /v1/chat/completions
model — string, required
messages — array, required
max_tokens — integer, default 4096
stream — boolean, default false
temperature — number, default 1
top_p — number, default 1
tools / tool_choice — optional
group — string, default "default"
Vision input (multimodal)
Prompt caching
Response shape — /v1/chat/completions
Non-streaming (stream: false)
Streaming (stream: true)
Pricing
Limits
Errors
Recipes
Minimum request
Tool use (function calling, OpenAI surface)
Vision
Long context with prompt caching (native Anthropic surface)
When to pick Claude Opus 4.8
Tips
Related