rreAPI Docs
rreAPI Docs
HomepageWelcome

Image

flux-2z-imageqwen-image-2midjourney-v7wan-2-7-imagegpt-image-2gpt-image-2-officialgemini-2.5-flash-image-previewgemini-3-pro-image-previewgemini-3.1-flash-image-previewdoubao-seedream-5-0-liteimagen-4-0

Audio

Mureka V9 Song APIVocal Remover APIMusic Extractor APIVoice Cleaner APIMultistem Splitter APIVoice Changer API

Video

kling-3-0music-video-1-0wan-2-7-videokling-motion-controlpixverse-v6doubao-seedance-2.0doubao-seedance-2.0-officialdoubao-seedance-2.0-betahappyhorse-1.0happyhorse-1.0-officialviduq3grok-imagine-video-1.5-betagrok-imagine-1.0-videoVeo 3.1gemini-omni

Chat

minimax-m3deepseek-v4gpt-5.5gpt-5.4claude-opus-4-8claude-opus-4-7claude-sonnet-4-6

Text

ai-essay-writerhumanizeai-text-detector

Tools

enhance-video-1.0
X (Twitter)

minimax-m3

MiniMax M3 API — open-weight frontier coding and agentic model on one OpenAI-compatible /v1/chat/completions endpoint on api.reapi.ai. 1M context, 512K max output, native thinking, multimodal image and video input, and tool use.

MiniMax M3 is an open-weight model that pairs frontier coding and agentic benchmarks with a 1M-token context window and native multimodal input — exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions endpoint. 1M context, 512K max output, native thinking, image and video input, prompt caching, and tool use. The wire model id is minimax/minimax-m3. Current rates live on the model page and on api.reapi.ai/pricing.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax/minimax-m3",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 1.0
  }'
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="minimax/minimax-m3",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    temperature=1.0,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "minimax/minimax-m3",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  temperature: 1.0,
  // `group` is an api.reapi.ai-specific extension; pass via extra body.
  // @ts-expect-error — not part of the OpenAI types
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "minimax/minimax-m3",
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":      true,
        "max_tokens":  4096,
        "temperature": 1.0,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The MiniMax M3 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

  1. Open api.reapi.ai and sign in (or create an account).
  2. Generate an API key under API Keys.
  3. Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEY

The chat surface (api.reapi.ai) is a separate workspace from the image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances do not cross over — a key issued on reapi.ai/settings/apikeys will not authenticate against api.reapi.ai/v1/chat/completions, and vice versa.


Endpoint

POST https://api.reapi.ai/v1/chat/completions

Drop-in for the OpenAI SDKs — same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1 and model to minimax/minimax-m3.


Request body

model — string, required

Must be "minimax/minimax-m3". Echoed back in the response envelope.

messages — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for image and video input:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url + video_url parts)"
}

Multi-turn history is sent in chronological order — the last message is the one the model responds to. Strip a prior turn's reasoning content before re-sending it in messages.

max_tokens — integer, default 4096

Upper bound on output tokens for this response, including the chain-of-thought when MiniMax M3 is thinking. The synchronous API supports up to 512K output tokens (128K recommended) — set it generously for long-form or reasoning-heavy outputs.

stream — boolean, default false

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

temperature — number, default 1

Sampling temperature. Lower values produce more deterministic output.

top_p — number, default 0.95

Nucleus sampling cutoff.

tools / tool_choice — optional

Standard OpenAI tool-calling parameters. MiniMax M3 is tuned for agentic, multi-step workflows with reliable function calling and JSON output, and it can interleave reasoning with tool calls across a long run.

group — string, default "default"

api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.


Thinking

MiniMax M3 is a native thinking model: it reasons before it answers and can interleave reasoning with tool calls during a multi-step run. Thinking is adaptive by default — the model reasons on hard tasks and answers directly on simple ones. When the model thinks, the chain-of-thought is returned in a reasoning_content field alongside content:

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "Let me work through this step by step...",
        "content": "The final answer."
      },
      "finish_reason": "stop"
    }
  ]
}

For latency-sensitive or simple calls you can disable thinking for faster, cheaper responses.

Strip reasoning_content from assistant messages before sending them back in a follow-up request — the chain-of-thought from a previous turn is not meant to be re-fed as input.


Multimodal input

MiniMax M3 is natively multimodal — send images and video alongside text via OpenAI content-parts:

{
  "model": "minimax/minimax-m3",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What does this chart show?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/chart.png" }
        }
      ]
    }
  ]
}

Video frames are passed the same way via video_url content-parts. Each image or video counts toward the input token budget based on its resolution and length.


Prompt caching

MiniMax M3 caches stable prompt prefixes. When a request reuses a cached prefix, those input tokens bill at a small fraction of the standard input rate — a big saving for agent loops and chatbots that replay long system prompts and tool schemas. The usage.prompt_tokens_details.cached_tokens field reports how many input tokens were served from cache.


Response shape

Non-streaming (stream: false)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1735000000,
  "model": "minimax/minimax-m3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

When the model thinks, message.reasoning_content carries the chain-of-thought alongside content.

Streaming (stream: true)

Content-Type: text/event-stream. Each data: line is a JSON delta in the OpenAI chunk format; the final event before [DONE] carries the finish_reason (stop / length / tool_calls / content_filter).


Pricing

MiniMax M3 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along three dimensions — input tokens, output tokens, and cache-read tokens. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

billable_input  = (prompt_tokens - cached_tokens) × input_rate      / 1,000,000
cache_read_bill = cached_tokens                   × cache_read_rate  / 1,000,000
output_bill     = completion_tokens               × output_rate      / 1,000,000

Output tokens include the chain-of-thought when the model is thinking. Failed requests are not charged.


Limits

LimitValue
Context window1M tokens
Max output per call512K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.


Errors

The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Common cases:

StatusWhenNotes
400Bad request shape, unsupported param comboCheck the messages array and model id
401Missing / invalid API keyRe-issue a key at api.reapi.ai
402Insufficient balanceTop up at api.reapi.ai
429Per-group rate limit hitBack off, or move to a different group
500Upstream / gateway errorSafe to retry — failed calls are not charged

api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that is a one-for-one wire failure and a retry from your side is safe; the gateway will not double-bill.


Recipes

Minimum request

{
  "model": "minimax/minimax-m3",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling)

{
  "model": "minimax/minimax-m3",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo today?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Vision

{
  "model": "minimax/minimax-m3",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Read the error in this screenshot and suggest a fix." },
        {
          "type": "image_url",
          "image_url": { "url": "https://your-cdn.com/screenshot.png" }
        }
      ]
    }
  ]
}

Long-context analysis

{
  "model": "minimax/minimax-m3",
  "max_tokens": 8192,
  "messages": [
    { "role": "system", "content": "<a long, stable reference document>" },
    { "role": "user", "content": "List every mention of constraint X with line numbers." }
  ]
}

Keep the long reference block stable across calls so the cache-read rate applies on subsequent requests.


When to pick MiniMax M3

Pick MiniMax M3 when you want frontier coding and agentic capability at open-weight pricing:

  • Long-horizon agentic coding — multi-file refactors, tool-using agents, and runs that must stay on-task across many steps.
  • Million-token analysis — whole repositories, long research packs, and multi-document review in a single call.
  • Multimodal workflows — tasks that mix screenshots, diagrams, video, and code in one conversation.

Route lighter traffic (classification, short replies, tight loops) to a cheaper model on the same key.


Tips

  • Set max_tokens generously when the task is hard. The chain-of-thought counts toward the output budget; a low cap can truncate before the final answer.
  • Strip reasoning_content before the next turn. Re-feeding a prior turn's chain-of-thought as input is not supported.
  • Stream by default for chat UX. Streaming cuts perceived latency.
  • Cache stable prefixes. Reuse the same system prompt and tool schemas across calls to bill repeated input at the low cache-read rate.
  • Disable thinking for simple, latency-sensitive calls. Adaptive thinking already skips reasoning on easy prompts, but you can force it off when you never need the chain-of-thought.

Related

  • Authentication
  • Quickstart
  • Errors catalog

Table of Contents

Quick example
Authentication
Endpoint
Request body
model — string, required
messages — array, required
max_tokens — integer, default 4096
stream — boolean, default false
temperature — number, default 1
top_p — number, default 0.95
tools / tool_choice — optional
group — string, default "default"
Thinking
Multimodal input
Prompt caching
Response shape
Non-streaming (stream: false)
Streaming (stream: true)
Pricing
Limits
Errors
Recipes
Minimum request
Tool use (function calling)
Vision
Long-context analysis
When to pick MiniMax M3
Tips
Related