minimax-m3
MiniMax M3 API — open-weight frontier coding and agentic model on one OpenAI-compatible /v1/chat/completions endpoint on api.reapi.ai. 1M context, 512K max output, native thinking, multimodal image and video input, and tool use.
MiniMax M3 is an open-weight model that pairs frontier coding and agentic
benchmarks with a 1M-token context window and native multimodal input —
exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat
Completions endpoint. 1M context, 512K max output, native thinking, image
and video input, prompt caching, and tool use. The wire model id is
minimax/minimax-m3. Current rates live on the
model page and on
api.reapi.ai/pricing.
Quick example
curl https://api.reapi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax/minimax-m3",
"group": "default",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": true,
"max_tokens": 4096,
"temperature": 1.0
}'from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.reapi.ai/v1",
)
stream = client.chat.completions.create(
model="minimax/minimax-m3",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
max_tokens=4096,
temperature=1.0,
extra_body={"group": "default"},
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.reapi.ai/v1",
});
const stream = await client.chat.completions.create({
model: "minimax/minimax-m3",
messages: [{ role: "user", content: "Hello" }],
stream: true,
max_tokens: 4096,
temperature: 1.0,
// `group` is an api.reapi.ai-specific extension; pass via extra body.
// @ts-expect-error — not part of the OpenAI types
group: "default",
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "minimax/minimax-m3",
"group": "default",
"messages": []map[string]string{
{"role": "user", "content": "Hello"},
},
"stream": true,
"max_tokens": 4096,
"temperature": 1.0,
})
req, _ := http.NewRequest("POST",
"https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Authentication
Every request needs a Bearer token. The MiniMax M3 chat workspace lives on
the api.reapi.ai platform — sign in there to create a key and top up tokens.
- Open api.reapi.ai and sign in (or create an account).
- Generate an API key under API Keys.
- Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEYThe chat surface (api.reapi.ai) is a separate workspace from the
image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances
do not cross over — a key issued on reapi.ai/settings/apikeys will not
authenticate against api.reapi.ai/v1/chat/completions, and vice versa.
Endpoint
POST https://api.reapi.ai/v1/chat/completionsDrop-in for the OpenAI SDKs — same request shape, same SSE wire format. Set
base_url to https://api.reapi.ai/v1 and model to minimax/minimax-m3.
Request body
model — string, required
Must be "minimax/minimax-m3". Echoed back in the response envelope.
messages — array, required
Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for image and video input:
{
"role": "system" | "user" | "assistant" | "tool",
"content": "string OR content-parts array (text + image_url + video_url parts)"
}Multi-turn history is sent in chronological order — the last message is the
one the model responds to. Strip a prior turn's reasoning content before
re-sending it in messages.
max_tokens — integer, default 4096
Upper bound on output tokens for this response, including the chain-of-thought when MiniMax M3 is thinking. The synchronous API supports up to 512K output tokens (128K recommended) — set it generously for long-form or reasoning-heavy outputs.
stream — boolean, default false
When true, the response is streamed as server-sent events (SSE) with
Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI
format, terminated by a data: [DONE] line.
temperature — number, default 1
Sampling temperature. Lower values produce more deterministic output.
top_p — number, default 0.95
Nucleus sampling cutoff.
tools / tool_choice — optional
Standard OpenAI tool-calling parameters. MiniMax M3 is tuned for agentic, multi-step workflows with reliable function calling and JSON output, and it can interleave reasoning with tool calls across a long run.
group — string, default "default"
api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.
Thinking
MiniMax M3 is a native thinking model: it reasons before it answers and can
interleave reasoning with tool calls during a multi-step run. Thinking is
adaptive by default — the model reasons on hard tasks and answers
directly on simple ones. When the model thinks, the chain-of-thought is
returned in a reasoning_content field alongside content:
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "Let me work through this step by step...",
"content": "The final answer."
},
"finish_reason": "stop"
}
]
}For latency-sensitive or simple calls you can disable thinking for faster, cheaper responses.
Strip reasoning_content from assistant messages before sending them back
in a follow-up request — the chain-of-thought from a previous turn is not
meant to be re-fed as input.
Multimodal input
MiniMax M3 is natively multimodal — send images and video alongside text via OpenAI content-parts:
{
"model": "minimax/minimax-m3",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What does this chart show?" },
{
"type": "image_url",
"image_url": { "url": "https://example.com/chart.png" }
}
]
}
]
}Video frames are passed the same way via video_url content-parts. Each
image or video counts toward the input token budget based on its resolution
and length.
Prompt caching
MiniMax M3 caches stable prompt prefixes. When a request reuses a cached
prefix, those input tokens bill at a small fraction of the standard input
rate — a big saving for agent loops and chatbots that replay long system
prompts and tool schemas. The
usage.prompt_tokens_details.cached_tokens field reports how many input
tokens were served from cache.
Response shape
Non-streaming (stream: false)
{
"id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"object": "chat.completion",
"created": 1735000000,
"model": "minimax/minimax-m3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}When the model thinks, message.reasoning_content carries the
chain-of-thought alongside content.
Streaming (stream: true)
Content-Type: text/event-stream. Each data: line is a JSON delta in the
OpenAI chunk format; the final event before [DONE] carries the
finish_reason (stop / length / tool_calls / content_filter).
Pricing
MiniMax M3 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along three dimensions — input tokens, output tokens, and cache-read tokens. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.
Per-call bill:
billable_input = (prompt_tokens - cached_tokens) × input_rate / 1,000,000
cache_read_bill = cached_tokens × cache_read_rate / 1,000,000
output_bill = completion_tokens × output_rate / 1,000,000Output tokens include the chain-of-thought when the model is thinking. Failed requests are not charged.
Limits
| Limit | Value |
|---|---|
| Context window | 1M tokens |
| Max output per call | 512K tokens |
Streams that hit the output cap finish with finish_reason: "length";
call again with a continuation message if you need more text.
Errors
The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:
{
"error": {
"message": "...",
"type": "invalid_request_error",
"code": "..."
}
}Common cases:
| Status | When | Notes |
|---|---|---|
400 | Bad request shape, unsupported param combo | Check the messages array and model id |
401 | Missing / invalid API key | Re-issue a key at api.reapi.ai |
402 | Insufficient balance | Top up at api.reapi.ai |
429 | Per-group rate limit hit | Back off, or move to a different group |
500 | Upstream / gateway error | Safe to retry — failed calls are not charged |
api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that is a one-for-one wire failure and a retry from your side is safe; the gateway will not double-bill.
Recipes
Minimum request
{
"model": "minimax/minimax-m3",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "Summarise this in three sentences." }
]
}Tool use (function calling)
{
"model": "minimax/minimax-m3",
"max_tokens": 4096,
"messages": [
{ "role": "user", "content": "What's the weather in Tokyo today?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up the current weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}Vision
{
"model": "minimax/minimax-m3",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
{
"type": "image_url",
"image_url": { "url": "https://your-cdn.com/screenshot.png" }
}
]
}
]
}Long-context analysis
{
"model": "minimax/minimax-m3",
"max_tokens": 8192,
"messages": [
{ "role": "system", "content": "<a long, stable reference document>" },
{ "role": "user", "content": "List every mention of constraint X with line numbers." }
]
}Keep the long reference block stable across calls so the cache-read rate applies on subsequent requests.
When to pick MiniMax M3
Pick MiniMax M3 when you want frontier coding and agentic capability at open-weight pricing:
- Long-horizon agentic coding — multi-file refactors, tool-using agents, and runs that must stay on-task across many steps.
- Million-token analysis — whole repositories, long research packs, and multi-document review in a single call.
- Multimodal workflows — tasks that mix screenshots, diagrams, video, and code in one conversation.
Route lighter traffic (classification, short replies, tight loops) to a cheaper model on the same key.
Tips
- Set
max_tokensgenerously when the task is hard. The chain-of-thought counts toward the output budget; a low cap can truncate before the final answer. - Strip
reasoning_contentbefore the next turn. Re-feeding a prior turn's chain-of-thought as input is not supported. - Stream by default for chat UX. Streaming cuts perceived latency.
- Cache stable prefixes. Reuse the same system prompt and tool schemas across calls to bill repeated input at the low cache-read rate.
- Disable thinking for simple, latency-sensitive calls. Adaptive thinking already skips reasoning on easy prompts, but you can force it off when you never need the chain-of-thought.