Does MiniMax M3 use the standard chat-completions format?

Yes. The MiniMax M3 is a drop-in for OpenAI's /v1/chat/completions — same request shape, same `messages` array, same `stream` / `temperature` / `max_tokens` parameters, same SSE wire format. Most teams migrate by changing the base URL to https://api.reapi.ai/v1, swapping the API key, and setting `model` to `minimax/minimax-m3`.

What is the MiniMax M3 context window and max output?

The MiniMax M3 defaults to a 1M-token context window (with a guaranteed minimum of 512K) and supports up to 512K output tokens per response, with 128K recommended. MiniMax Sparse Attention keeps long-context inference efficient, so you can feed entire repositories and long documents without chunking.

Is MiniMax M3 a reasoning model?

Yes. MiniMax M3 is a native thinking model that reasons before it answers and supports interleaved thinking during tool use. Thinking is adaptive by default — the model reasons on hard tasks and answers directly on simple ones — and you can disable it for the fastest, cheapest replies. The model id stays the same either way.

Does MiniMax M3 support vision and tool use?

Yes. The MiniMax M3 is natively multimodal: it accepts image and video inputs alongside text in the same call, and it supports function calling / tool use with JSON output. It is tuned for agentic, multi-step workflows that mix vision, retrieval, and code.

How does MiniMax M3 compare to DeepSeek V4 and other frontier models?

MiniMax M3 and DeepSeek V4 are both open-weight, value-priced models with 1M context, thinking, and tool use. MiniMax positions M3 around frontier coding and agentic benchmarks plus native image-and-video multimodality; MiniMax reports M3 in range of top closed-source models on software-engineering tasks. The versus table on this page breaks down the differences. All on one api.reapi.ai key, so you can A/B them per request.

Does MiniMax M3 support prompt caching?

Yes. The MiniMax M3 caches stable prompt prefixes, and cache reads bill at a small fraction of the standard input rate. Reuse the same system prompt and tool schemas across calls and the discount applies automatically to the repeated tokens — a large saving for long-context agents and chatbots.

Where do I create an API key and buy tokens?

Both happen on api.reapi.ai — the chat workspace runs as its own platform separate from the image / video task gateway at reapi.ai. Sign up at api.reapi.ai, generate a key under API Keys, and top up under Top Up. A reapi.ai/settings/apikeys key will not authenticate against the MiniMax M3 chat endpoint.

MiniMax M3 — Frontier Coding, 1M Context

MiniMax M3 is an open-weight model that pairs frontier coding and agentic benchmarks with a 1M-token context window and native multimodal input. MiniMax M3 reasons before it answers, calls tools across long-horizon runs, and reads images and video in the same call — exposed on api.reapi.ai as a drop-in unified endpoint. Pay-as-you-go in USD at a fraction of closed-source frontier rates.

MiniMax M3modelminimax/minimax-m3

MiniMax M3 playground

Open the chat playground to run MiniMax M3 through the standard chat completions surface with your api.reapi.ai key.

Open chat playground

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

The MiniMax M3 driving a long-horizon agentic coding session

Long-horizon agentic coding and software engineering

Agentic coding is the headline of the MiniMax M3. MiniMax reports frontier-level results on software-engineering benchmarks — 59.0% on SWE-Bench Pro and 66.0% on Terminal-Bench 2.1 — putting MiniMax M3 in range of the top closed-source coding models while staying open-weight. Point a coding agent at the MiniMax M3 and it scopes the task, calls tools, reasons through multi-step work, and self-corrects across a long run, all in one session.

Read the API docs

The MiniMax M3 reasoning across a million-token analysis pack

Million-token codebase and document analysis

The MiniMax M3 defaults to a 1M-token context window — enough to load a whole mid-size repository, a long research pack, or a multi-turn agent trace in a single call. MiniMax Sparse Attention keeps long-context inference efficient, so MiniMax M3 workloads like architecture review, dependency audits, and migration planning rarely need chunking. Stable prompt prefixes hit the low cache-read rate on every repeat.

The MiniMax M3 combining image, video, and tool use in one call

Native multimodal understanding and tool use

The MiniMax M3 is multimodal from the ground up: send images and video alongside text in the same Chat Completions call — screenshots, diagrams, document scans, and clips — and the model reasons over all of it. Combined with reliable function calling and JSON output, the MiniMax M3 drives browser agents, document pipelines, and tool-using workflows that mix vision, retrieval, and code.

Pricing

Credit-based — 1 credit = $0.001 USD. Pay only for completed generations.

Category	Unit	Price
Token pricing
Input	1M tokens	$0.6
Output	1M tokens	$2.4
Cache read	1M tokens	$0.12

Why reAPI

Drop-in access

The MiniMax M3 speaks OpenAI Chat Completions verbatim. Moving an existing OpenAI integration to MiniMax M3 is a base URL, an API key, and a model-string change — `minimax/minimax-m3` — not a platform rewrite. The same `messages` array, the same streaming format, the same tool-calling shape.

Frontier coding at value pricing

MiniMax M3 is open-weight and priced to match. It posts frontier coding and agentic benchmarks while costing a fraction of closed-source models per token — and prompt caching drops the price again on repeated context. Run premium agentic work without premium per-token bills.

One key across MiniMax, GPT, Claude, and Gemini

A single api.reapi.ai key unlocks the MiniMax M3 alongside GPT-5.5, Claude Opus 4.8, DeepSeek V4, Gemini, and every other frontier chat model on the platform. Compare vendors, add fallbacks, and route traffic per call with a configuration change instead of an integration project.

MiniMax M3 vs DeepSeek V4

MiniMax M3 and DeepSeek V4 are both open-weight, value-priced models with a 1M-token context window, thinking, and tool use. Here is how MiniMax M3 is positioned against DeepSeek V4 on the dimensions that matter for agentic and coding work.

Capability

MiniMax M3 on reAPI

DeepSeek V4

Positioning

Single open-weight model tuned for frontier coding, long-horizon agents, and native multimodality.

Two open-weight variants — Flash (fast / low cost) and Pro (flagship reasoning).

Context window

1M tokens by default, with a guaranteed 512K minimum and MiniMax Sparse Attention for efficient long context.

1M-token context window with DeepSeek Sparse Attention.

Max output

Up to 512K output tokens per response (128K recommended).

Up to 384K output tokens per response.

Thinking

Native thinking with interleaved reasoning during tool use; adaptive by default, can be disabled.

Thinking mode on by default, with a dual thinking / non-thinking switch.

Multimodal input

Native image and video understanding in the same Chat Completions call.

Image input supported (beta); text-and-image.

Agentic and coding focus

Vendor-reported frontier results on SWE-Bench Pro, Terminal-Bench, and agent benchmarks; tuned for long-horizon coding agents.

Dedicated agentic optimizations; open-source SOTA on agentic coding per DeepSeek.

Comparison reflects publicly documented behavior from MiniMax's M3 release notes and DeepSeek's V4 documentation at the time of writing. Benchmark figures are vendor-reported. Model behavior and pricing can change; check the pricing card above and the API docs for current values.

Ship the MiniMax M3 in three steps

step 01
Create an account and key on api.reapi.ai
Sign up at api.reapi.ai, open the console, generate an API key under API Keys, and top up tokens under Top Up. The chat workspace is separate from the reapi.ai image/video gateway — keys do not cross over.
Open
step 02
Send your first request
POST https://api.reapi.ai/v1/chat/completions with `model` set to `minimax/minimax-m3`, your `messages` array, and `max_tokens`. The MiniMax M3 endpoint uses the standard chat-completions format, including streamed responses, so most SDKs work with only a base URL change.
Open
step 03
Tune for reasoning and cost
MiniMax M3 thinks adaptively — it reasons when a task is hard and answers directly when it is not. Reuse stable system prompts and tool schemas across calls to hit the low cache-read rate, and set `max_tokens` high enough to fit the chain-of-thought on reasoning-heavy work.
Open

Frequently asked questions

Common questions about this model.

The MiniMax M3 is billed pay-as-you-go in USD against your api.reapi.ai token balance. The pricing card on this page shows the live per-1M-token input, output, and cache-read rates. Cache reads are dramatically cheaper than re-sending the same tokens, and failed requests are not charged.

Related models

Explore more models in the same category.

View all models

DeepSeek

DeepSeek V4

From $0.140 per 1M tokens

Chat

OpenAI

GPT-5.4

From $1.00 per 1M tokens

Chat

Anthropic

Claude Opus 4.7

From $2.00 per 1M tokens

Chat

Anthropic

Claude Sonnet 4.6

From $2.00 per 1M tokens

Chat

View all models

start building

Ready to ship?

Try it in the playground or grab an API key to integrate now.

Get API key View API docs

MiniMax M3 — Frontier Coding, 1M Context

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

Long-horizon agentic coding and software engineering

Read the API docs

Million-token codebase and document analysis