What Is Claude Opus 4.8? Anthropic's New Model Explained

Anthropic shipped Claude Opus 4.8 on May 28, 2026, an upgrade to Opus 4.7 that it describes as a modest but tangible improvement^[1]. It is Anthropic's most capable model for complex reasoning and long-horizon agentic coding, and it launched at the same price as the model it replaces. The headline change is not a benchmark leap but reliability: Claude Opus 4.8 is roughly four times less likely than Opus 4.7 to let a flaw in code it wrote pass unremarked^[1].

This guide explains what Claude Opus 4.8 is, what changed from 4.7, its published benchmarks, what it costs, and how to call it through an OpenAI-compatible API. Every number below comes from Anthropic's own announcement, model overview, and pricing pages, retrieved on May 30, 2026.

What is Claude Opus 4.8?

Claude Opus 4.8 is the latest model in Anthropic's Opus line, the tier aimed at the hardest reasoning and agentic-coding work. The API identifier is claude-opus-4-8, with no date suffix^[2].

The core specs:

Context window: 1M tokens, billed at the standard per-token rate across the whole window^[2].
Max output: 128K tokens on the synchronous API, up to 300K on the Batch API with a beta header^[2].
Input modalities: text and image. Output is text. Vision is supported; Anthropic does not list audio or video input^[2].
Knowledge cutoff: January 2026^[2].

What's new compared to Opus 4.7

Anthropic frames Opus 4.8 as building on 4.7 rather than fully replacing it, and ships it at the same input and output price^[1]. Three changes stand out.

Honesty. Opus 4.8 is about four times less likely than its predecessor to let a flaw in its own code pass without flagging it, and it is more willing to mark uncertainty instead of asserting unsupported claims^[1].
A cheaper, faster fast mode. Fast mode now runs at $10 per million input tokens and $50 per million output, which Anthropic says is three times cheaper and 2.5 times faster than the previous generation's fast tier^[1].
Effort control. The model defaults to high effort and can be pushed to extra or max, spending more tokens for better results on hard problems^[2].

Core capabilities of Claude Opus 4.8

A 1M-token context window

Opus 4.8 reads up to a million tokens in one request, enough for a large codebase or a long document set, and a 900K-token request is billed at the same per-token rate as a 9K-token one^[2]. On Microsoft Foundry the window is capped at 200K^[2].

Agentic coding and reasoning

The model leads Anthropic's published comparison on agentic coding and multidisciplinary reasoning. It scores 69.2% on SWE-Bench Pro and 57.9% on Humanity's Last Exam with tools, both ahead of Opus 4.7^[1]. Full numbers are in the benchmark table below.

Effort control and adaptive thinking

Rather than a single extended-thinking toggle, Opus 4.8 uses adaptive thinking plus an effort setting. Leaving effort at the default high suits most work; raising it to extra or max trades more tokens for deeper reasoning on the hardest tasks^[2].

Vision, tool use, and prompt caching

Opus 4.8 accepts image input alongside text, supports tool use and function calling, and works with prompt caching to cut the cost of repeated context^[2]. These are the building blocks for agents that read screenshots, call tools, and carry long context across a session.

Claude Opus 4.8 benchmarks

These are Anthropic's own published results, comparing Opus 4.8 with Opus 4.7, GPT-5.5, and Gemini 3.1 Pro^[1]. As with any vendor-run benchmark, treat them as a directional signal rather than a neutral audit.

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
Agentic coding (SWE-Bench Pro)	69.2%	64.3%	58.6%	54.2%
Terminal coding (Terminal-Bench 2.1)	74.6%	66.1%	78.2%	70.3%
Reasoning (Humanity's Last Exam, no tools)	49.8%	46.9%	41.4%	44.4%
Reasoning (Humanity's Last Exam, with tools)	57.9%	54.7%	52.2%	51.4%
Computer use (OSWorld-Verified)	83.4%	82.8%	78.7%	76.2%
Knowledge work (GDPval-AA)	1890	1753	1769	1314
Financial analysis (Finance Agent v2)	53.9%	51.5%	51.8%	43.0%

Opus 4.8 tops the table on every row except Terminal-Bench 2.1, where GPT-5.5 leads at 78.2%^[1].

Claude Opus 4.8 pricing

Pricing is per million tokens (MTok), unchanged from Opus 4.7^[1]^[3]:

Item	Price / MTok
Input	$5
Output	$25
Cache read	$0.50
Cache write (5-minute)	$6.25
Cache write (1-hour)	$10
Batch input	$2.50
Batch output	$12.50
Fast mode input	$10
Fast mode output	$50

The Batch API runs at half the standard input and output price, and prompt caching makes repeated context cheap to reread at $0.50 per million tokens^[3].

How to access Claude Opus 4.8

You can call Claude Opus 4.8 directly through Anthropic, or reach it through reAPI's OpenAI-compatible gateway at Anthropic's standard rates. See the reapi.ai/models/claude-opus-4-8 page for live rates and a playground. The advantage of the gateway is one key and one client across providers: the same integration that calls GPT-5 and Gemini, plus reAPI's image and video models, also calls Claude Opus 4.8.

Point the OpenAI SDK at reAPI and set the model:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_REAPI_KEY",
    base_url="https://api.reapi.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-opus-4-8",
    messages=[
        {"role": "user", "content": "Refactor this function and explain the change."},
    ],
    extra_body={"group": "default"},
)
print(resp.choices[0].message.content)

The native Anthropic /v1/messages surface is available too, so SDKs written for either format work without a rewrite. Input runs $5 per million tokens and output $25, the same rates Anthropic charges directly.

Who should use Claude Opus 4.8

Opus 4.8 is built for work where a mistake is expensive and context is large:

Agentic coding. Long-horizon tasks where the model plans, edits, and verifies its own work, and where the honesty gains reduce silent bugs.
Large-context analysis. Reading an entire codebase or document set in one request, up to a million tokens.
Tool-using agents. Workflows that call tools, read screenshots, and carry state across many steps.
High-stakes reasoning. Research, financial analysis, and other work where flagged uncertainty beats a confident wrong answer.

For high-volume, low-complexity calls, a smaller and cheaper model is usually the better fit; Opus is the tier you reach for when capability matters more than cost per token.

Why Claude Opus 4.8 matters in 2026

The interesting part of Opus 4.8 is not a single benchmark. It is that Anthropic chose to spend a release on reliability, cutting how often the model lets its own mistakes slide, rather than chasing a leaderboard. For agentic coding, where a model edits real code across many steps, that kind of honesty compounds: fewer silent bugs to catch downstream. Paired with a million-token window and a cheaper fast mode, Opus 4.8 is aimed squarely at teams building agents they have to trust.

FAQ

When was Claude Opus 4.8 released?

May 28, 2026, as an upgrade to Claude Opus 4.7, at the same price^[1].

What is the Claude Opus 4.8 context window?

1M tokens, billed at the standard per-token rate across the full window. Maximum output is 128K tokens, or up to 300K on the Batch API^[2].

How much does Claude Opus 4.8 cost?

$5 per million input tokens and $25 per million output, with cache reads at $0.50 and the Batch API at half price^[3]. Fast mode is $10 input and $50 output^[1].

Is Claude Opus 4.8 multimodal?

It accepts text and image input and returns text, with vision supported. Anthropic does not list audio or video input for the model^[2].

How is Claude Opus 4.8 different from Opus 4.7?

Same price, with higher honesty (about four times less likely to let its own code flaws pass), a cheaper and faster fast mode, and small gains across Anthropic's benchmark suite^[1].

How do I call Claude Opus 4.8 with the OpenAI SDK?

Point the OpenAI client at https://api.reapi.ai/v1, set model to claude-opus-4-8, and send a normal chat request. reAPI exposes Claude through an OpenAI-compatible endpoint at Anthropic's standard rates.

The short version

Claude Opus 4.8 is an incremental but real upgrade: the same price as Opus 4.7, a million-token context, and a deliberate focus on honesty that matters most for agentic coding. If you are building agents that edit code or reason over large context, Claude Opus 4.8 is the current top of Anthropic's line, and you can reach it through an OpenAI-compatible API alongside every other model you call.

References

Anthropic. Introducing Claude Opus 4.8. Retrieved May 2026 from anthropic.com/news/claude-opus-4-8
Anthropic. Models overview — context, output, modality, and capabilities. Retrieved May 2026 from platform.claude.com/docs/en/about-claude/models/overview
Anthropic. Pricing — token, cache, and batch rates. Retrieved May 2026 from platform.claude.com/docs/en/about-claude/pricing