claude-fable-5

Claude Fable 5 — Anthropic's most capable widely released model for the most demanding reasoning and long-horizon agentic work. OpenAI-compatible /v1/chat/completions (or native /v1/messages) on api.reapi.ai with 1M context, 128K max output, always-on adaptive thinking, vision input, and prompt caching.

Claude Fable 5 is Anthropic's most capable widely released model — a tier above Opus — built for the most demanding reasoning and long-horizon agentic work, exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions endpoint (the native Anthropic /v1/messages surface is also available). 1M token context, 128K max output, always-on adaptive thinking, vision input, prompt caching, and tool use. Current rates live on the model page and on api.reapi.ai/pricing.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-fable-5",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-fable-5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai",
)

with client.messages.stream(
    model="claude-fable-5",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "claude-fable-5",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  // `group` is an api.reapi.ai-specific extension; pass via extra body.
  // @ts-expect-error — not part of the OpenAI types
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "claude-fable-5",
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":     true,
        "max_tokens": 4096,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The Claude Fable 5 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

Open api.reapi.ai and sign in (or create an account).
Generate an API key under API Keys.
Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).

Authorization: Bearer YOUR_API_KEY

The chat surface (api.reapi.ai) is a separate workspace from the image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances do not cross over — a key issued on reapi.ai/settings/apikeys will not authenticate against api.reapi.ai/v1/chat/completions, and vice versa.

Endpoints

POST https://api.reapi.ai/v1/chat/completions   # OpenAI-compatible
POST https://api.reapi.ai/v1/messages           # Anthropic-native

Both surfaces accept claude-fable-5. Pick whichever matches your SDK of record:

/v1/chat/completions — drop-in for the OpenAI SDKs. Same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1.
/v1/messages — native Anthropic Messages format. Set base_url to https://api.reapi.ai for the Anthropic Python / TypeScript SDKs. Required for callers that need Anthropic-specific features (cache_control blocks for prompt caching, the effort parameter, summarized thinking display, native multi-block content, the full tool-use spec).

Request body — `/v1/chat/completions`

`model` — string, required

Must be "claude-fable-5". The value is echoed back in the response envelope.

`messages` — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url parts)"
}

Multi-turn history is sent in chronological order — the last message is the one Claude responds to.

`max_tokens` — integer, default `4096`

Upper bound on output tokens. Anthropic's API requires max_tokens on every call, including streamed ones — even though the OpenAI SDKs treat it as optional. Set it generously (128000 is the hard cap on the synchronous API) for long-form outputs; the model still stops at the natural end of its response.

`stream` — boolean, default `false`

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

`tools` / `tool_choice` — optional

Standard OpenAI tool-calling parameters. Claude Fable 5 supports the full OpenAI tool-use spec via this surface. For Anthropic's native tool-use schema (with cache_control, server-side tools, etc.) call /v1/messages directly.

`group` — string, default `"default"`

api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.

No sampling parameters. Claude Fable 5 does not accept temperature, top_p, or top_k — Anthropic removed them on this model generation and requests that include them are rejected upstream. Steer style and variance through prompting instead.

Adaptive thinking — always on

Adaptive thinking is the only thinking mode on Claude Fable 5. It applies on every call — there is no way to disable it — and the model decides per request how much reasoning the task needs. Two consequences for integrators:

Raw chain-of-thought is never returned. Thinking blocks arrive with empty content by default. On the native /v1/messages surface, set thinking: type adaptive, display summarized (see Anthropic's adaptive-thinking docs for the exact field shape) to receive readable summarized thinking.
Depth is tuned with effort, not a token budget. On the native surface, output_config.effort accepts low through max. Higher effort means deeper reasoning and more output tokens; lower effort means faster, cheaper calls.

Vision input (multimodal)

Send images alongside text via OpenAI content-parts:

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What does this chart show?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/chart.png" }
        }
      ]
    }
  ]
}

Supported image formats: PNG, JPEG, GIF, WebP. Each image counts toward the input token budget based on its resolution.

Prompt caching

Anthropic's prompt caching pays off on stable system prompts, recurring RAG context, and long multi-turn agent histories. The first call pays the cache-write rate on the cacheable region; subsequent calls within the cache window pay only the (much lower) cache-read rate on those tokens.

To enable caching, call /v1/messages natively and add a cache_control block. Example:

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "system": [
    {
      "type": "text",
      "text": "<your long stable system prompt>",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [
    { "role": "user", "content": "Question for the assistant" }
  ]
}

The cache key is the hash of the cacheable content. See api.reapi.ai/pricing for cache-read and cache-write rates.

Refusals and fallback

Claude Fable 5 includes safety classifiers that can decline certain requests. On the native /v1/messages surface a refusal is returned as a successful response with stop_reason: "refusal" — not an HTTP error — and the response reports which classifier declined the request.

Two things follow:

A request refused before any output is generated is not billed.
A refused request can usually be served by another Claude model. Retry it with a different model value on the same key — Claude Opus 4.8 covers most workloads the classifier declines on Fable 5.

Handle stop_reason: "refusal" (native surface) or an empty completion with a refusal marker (OpenAI surface) explicitly in your integration rather than treating it as a transport failure: it is deterministic for a given prompt, so a verbatim retry on the same model will refuse again.

Response shape — `/v1/chat/completions`

Non-streaming (`stream: false`)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1749600000,
  "model": "claude-fable-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

usage.prompt_tokens_details.cached_tokens reports how many input tokens were served from cache — the part billed at the cache-read rate rather than the standard input rate.

Streaming (`stream: true`)

Content-Type: text/event-stream. Each data: line is a JSON delta in the OpenAI chunk format; the final event before [DONE] carries the finish_reason (stop / length / tool_calls / content_filter).

Pricing

Claude Fable 5 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along several dimensions — input tokens, output tokens, cache-read tokens, and per-request web search. The full 1M context window is billed at standard per-token rates with no long-context premium, and requests refused before any output is generated are not billed. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

billable_input  = (prompt_tokens - cached_tokens) × input_rate      / 1,000,000
cache_read_bill = cached_tokens                   × cache_read_rate  / 1,000,000
output_bill     = completion_tokens               × output_rate      / 1,000,000

Cache-write rate applies on the first call that writes a cache block; subsequent hits pay only the cache-read rate. Web search, when used, is billed per request. Failed requests are not charged.

Limits

Limit	Value
Context window	1M tokens
Max output per call	128K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.

Errors

The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Common cases:

Status	When	Notes
`400`	Missing `max_tokens`, sampling params sent, bad shape	Anthropic requires `max_tokens`; `temperature` / `top_p` / `top_k` are rejected on this model
`401`	Missing / invalid API key	Re-issue a key at api.reapi.ai
`402`	Insufficient balance	Top up at api.reapi.ai
`429`	Per-group rate limit hit	Back off, or move to a different `group`
`500`	Upstream / gateway error	Safe to retry — failed calls are not charged

api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that's a one-for-one wire failure and a retry from your side is safe; the upstream provider may have already produced output, but the gateway will not double-bill.

Recipes

Minimum request

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling, OpenAI surface)

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo today?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Vision

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
        {
          "type": "image_url",
          "image_url": { "url": "https://your-cdn.com/screenshot.png" }
        }
      ]
    }
  ]
}

Long context with prompt caching (native Anthropic surface)

{
  "model": "claude-fable-5",
  "max_tokens": 4096,
  "system": [
    {
      "type": "text",
      "text": "<800K-token reference document>",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [
    { "role": "user", "content": "Find every mention of the constraint X and list them with line numbers." }
  ]
}

When to pick Claude Fable 5

Pick Claude Fable 5 when the task sits past what Opus-tier models handle cleanly:

Frontier long-horizon agentic work — week-long refactors, codebase-scale migrations, autonomous runs that must stay coherent across hundreds of steps.
The most demanding reasoning — deep multi-step problems, ambiguous specifications, analysis where the answer quality justifies the top capability tier.
Large-context analysis — full codebases, long research packs, multi-document review, audit work across the 1M window.

Route everyday premium coding to Claude Opus 4.8 and lighter traffic (classification, short replies, tight loops) to cheaper Claude or GPT models on the same key.

Tips

Set max_tokens generously. Anthropic enforces it strictly — the model still stops at the natural end of its response, but a low cap will truncate before the real ending.
Stream by default for chat UX. Streaming cuts perceived latency dramatically — especially relevant here, since always-on adaptive thinking can add a pause before the first visible token.
Don't send sampling parameters. temperature, top_p, and top_k are rejected on Claude Fable 5. Steer variance and style through prompting.
Cache the stable parts of long prompts. A 500K-token RAG context on top of a 1KB user question can pay the cache-read rate on every subsequent call instead of the standard input rate — a big saving on multi-turn agents replaying long histories.
Handle refusals as a first-class outcome. Detect the refusal stop reason and retry on another Claude model instead of retrying verbatim.
Use the native /v1/messages surface for Anthropic-only features. cache_control, effort, summarized thinking display, native multi-block content, the full tool-use spec — all of those work through /v1/messages without needing translation.

claude-fable-5

Table of Contents