rreAPI Docs
rreAPI Docs
HomepageWelcome

Image

wan-2-7-imagegpt-image-2gpt-image-2-officialgemini-2.5-flash-image-previewgemini-3-pro-image-previewgemini-3.1-flash-image-previewdoubao-seedream-5-0-liteimagen-4-0

Audio

Mureka V9 Song APIVocal Remover APIMusic Extractor APIVoice Cleaner APIMultistem Splitter APIVoice Changer API

Video

music-video-1-0wan-2-7-videokling-motion-controlpixverse-v6doubao-seedance-2.0doubao-seedance-2.0-officialdoubao-seedance-2.0-betahappyhorse-1.0happyhorse-1.0-officialviduq3grok-imagine-1.0-videoVeo 3.1gemini-omni

Chat

deepseek-v4gpt-5.5gpt-5.4claude-opus-4-8claude-opus-4-7claude-sonnet-4-6

Text

humanizeai-text-detector

Tools

enhance-video-1.0
X (Twitter)

deepseek-v4

DeepSeek V4 API — Flash and Pro open-weight models on one OpenAI-compatible /v1/chat/completions endpoint on api.reapi.ai. 1M context, 384K max output, thinking mode by default, vision input, and tool use.

The DeepSeek V4 API ships two open-weight models — deepseek-v4-flash (fast, low-cost) and deepseek-v4-pro (frontier reasoning and agentic coding) — exposed through api.reapi.ai as a drop-in OpenAI-compatible Chat Completions endpoint. Both bring a 1M-token context window, 384K max output, thinking mode on by default, vision input, tool use, and context caching. Current rates live on the model page and on api.reapi.ai/pricing.

Quick example

curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 0.7
  }'
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.reapi.ai/v1",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",  # or "deepseek-v4-pro"
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    max_tokens=4096,
    temperature=0.7,
    extra_body={"group": "default"},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.reapi.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash", // or "deepseek-v4-pro"
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  max_tokens: 4096,
  temperature: 0.7,
  // `group` is an api.reapi.ai-specific extension; pass via extra body.
  // @ts-expect-error — not part of the OpenAI types
  group: "default",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model": "deepseek-v4-flash", // or "deepseek-v4-pro"
        "group": "default",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello"},
        },
        "stream":      true,
        "max_tokens":  4096,
        "temperature": 0.7,
    })
    req, _ := http.NewRequest("POST",
        "https://api.reapi.ai/v1/chat/completions", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Authentication

Every request needs a Bearer token. The DeepSeek V4 chat workspace lives on the api.reapi.ai platform — sign in there to create a key and top up tokens.

  1. Open api.reapi.ai and sign in (or create an account).
  2. Generate an API key under API Keys.
  3. Top up tokens under Top Up (pay-as-you-go, billed in USD per 1M tokens — see api.reapi.ai/pricing).
Authorization: Bearer YOUR_API_KEY

The chat surface (api.reapi.ai) is a separate workspace from the image/video/audio task gateway at reapi.ai/api/v1/*. Keys and balances do not cross over — a key issued on reapi.ai/settings/apikeys will not authenticate against api.reapi.ai/v1/chat/completions, and vice versa.


Models

The DeepSeek V4 family ships two variants. Both share the same endpoint, request shape, 1M context window, and 384K max output — pick the variant with the model field.

modelBest forArchitecture
deepseek-v4-flashFast, low-cost everyday work — autocomplete, batch analysis, chat backends. Reasoning closely approaches Pro.284B total / 13B active (MoE)
deepseek-v4-proFrontier reasoning, complex debugging, and agentic coding. Rivals top closed-source models.1.6T total / 49B active (MoE)

The legacy ids deepseek-chat and deepseek-reasoner map to deepseek-v4-flash in non-thinking and thinking mode respectively. New integrations should use the explicit deepseek-v4-flash / deepseek-v4-pro ids.


Endpoint

POST https://api.reapi.ai/v1/chat/completions

Drop-in for the OpenAI SDKs — same request shape, same SSE wire format. Set base_url to https://api.reapi.ai/v1. DeepSeek V4 also supports the Anthropic API format natively; this guide documents the OpenAI-compatible Chat Completions surface.


Request body

model — string, required

"deepseek-v4-flash" or "deepseek-v4-pro". Echoed back in the response envelope.

messages — array, required

Conversation history as an array of message objects. Same shape as the OpenAI Chat Completions spec, plus content-parts for vision:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "string OR content-parts array (text + image_url parts)"
}

Multi-turn history is sent in chronological order — the last message is the one the model responds to. Do not echo a prior turn's reasoning_content back into messages; strip it before the next request.

max_tokens — integer, default 4096

Upper bound on output tokens for this response, including the chain-of-thought when thinking mode is on. The synchronous API supports up to 384K output tokens — set it generously for long-form or reasoning-heavy outputs.

stream — boolean, default false

When true, the response is streamed as server-sent events (SSE) with Content-Type: text/event-stream. Each event is a JSON delta in the OpenAI format, terminated by a data: [DONE] line.

temperature — number, default 1

Sampling temperature. Lower values produce more deterministic output. Ignored while the model is in thinking mode.

top_p — number, default 1

Nucleus sampling cutoff. Ignored in thinking mode.

frequency_penalty / presence_penalty — number, default 0

Standard OpenAI repetition controls. Ignored in thinking mode.

tools / tool_choice — optional

Standard OpenAI tool-calling parameters. DeepSeek V4 ships dedicated agentic optimizations with reliable function calling and JSON output.

group — string, default "default"

api.reapi.ai-specific extension. Selects a token group on the gateway, which routes the request to a specific upstream channel pool. Omit if default routing is fine.


Thinking mode

DeepSeek V4 runs in thinking mode by default: before the final answer it produces a chain of thought, returned in a reasoning_content field at the same level as content.

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "Let me work through this step by step...",
        "content": "The final answer."
      },
      "finish_reason": "stop"
    }
  ]
}

For latency-sensitive or simple calls, switch to non-thinking mode for faster, cheaper responses. When thinking is on, the sampling parameters (temperature, top_p, frequency_penalty, presence_penalty) have no effect.

Strip reasoning_content from assistant messages before sending them back in a follow-up request — the chain-of-thought from a previous turn is not meant to be re-fed as input.


Vision input (beta)

Send images alongside text via OpenAI content-parts:

{
  "model": "deepseek-v4-pro",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What does this chart show?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/chart.png" }
        }
      ]
    }
  ]
}

Each image counts toward the input token budget based on its resolution.


Context caching

DeepSeek V4 caches stable prompt prefixes automatically. When a request hits the cache, the cached input tokens bill at a small fraction of the standard input rate — a big saving for agent loops and chatbots that replay long system prompts and tool schemas. No configuration is required; reuse the same prefix across calls and the discount applies. The usage.prompt_tokens_details.cached_tokens field reports how many input tokens were served from cache.


Response shape

Non-streaming (stream: false)

{
  "id": "chatcmpl-018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "object": "chat.completion",
  "created": 1735000000,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }
}

When thinking mode is on, message.reasoning_content carries the chain-of-thought alongside content.

Streaming (stream: true)

Content-Type: text/event-stream. Each data: line is a JSON delta in the OpenAI chunk format; the final event before [DONE] carries the finish_reason (stop / length / tool_calls / content_filter).


Pricing

DeepSeek V4 is billed pay-as-you-go in USD against your api.reapi.ai token balance. It bills along three dimensions — input tokens (cache miss), input tokens (cache hit), and output tokens — and deepseek-v4-pro costs more per token than deepseek-v4-flash. Current rates live on api.reapi.ai/pricing and in the pricing card at the top of the model page.

Per-call bill:

billable_input  = (prompt_tokens - cached_tokens) × input_rate      / 1,000,000
cache_read_bill = cached_tokens                   × cache_hit_rate   / 1,000,000
output_bill     = completion_tokens               × output_rate      / 1,000,000

Output tokens include the chain-of-thought when thinking mode is on. Failed requests are not charged.


Limits

LimitValue
Context window1M tokens
Max output per call384K tokens

Streams that hit the output cap finish with finish_reason: "length"; call again with a continuation message if you need more text.


Errors

The error envelope follows the OpenAI shape — HTTP status, plus a JSON body:

{
  "error": {
    "message": "...",
    "type": "invalid_request_error",
    "code": "..."
  }
}

Common cases:

StatusWhenNotes
400Bad request shape, unsupported param comboCheck the messages array and model id
401Missing / invalid API keyRe-issue a key at api.reapi.ai
402Insufficient balanceTop up at api.reapi.ai
429Per-group rate limit hitBack off, or move to a different group
500Upstream / gateway errorSafe to retry — failed calls are not charged

api.reapi.ai does not internally retry chat requests. Every customer call maps to exactly one upstream POST. If a network error reaches you, that is a one-for-one wire failure and a retry from your side is safe; the gateway will not double-bill.


Recipes

Minimum request

{
  "model": "deepseek-v4-flash",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "Summarise this in three sentences." }
  ]
}

Tool use (function calling)

{
  "model": "deepseek-v4-pro",
  "max_tokens": 4096,
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo today?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Long-context analysis

{
  "model": "deepseek-v4-pro",
  "max_tokens": 8192,
  "messages": [
    { "role": "system", "content": "<a long, stable reference document>" },
    { "role": "user", "content": "List every mention of constraint X with line numbers." }
  ]
}

Keep the long reference block stable across calls so the cache-hit rate applies on subsequent requests.


When to pick Flash vs Pro

  • deepseek-v4-flash — latency-sensitive, high-throughput, cost-sensitive work: in-IDE autocomplete, inline suggestions, CI code review, bulk summarization, chat backends. Reasoning closely approaches Pro at a fraction of the price.
  • deepseek-v4-pro — work where reasoning depth dominates: complex debugging, architecture planning, math/STEM, and long-horizon agentic coding. Both share one key — route per request.

Tips

  • Set max_tokens generously when thinking is on. The chain-of-thought counts toward the output budget; a low cap can truncate before the final answer.
  • Strip reasoning_content before the next turn. Re-feeding a prior turn's chain-of-thought as input is not supported.
  • Stream by default for chat UX. Streaming cuts perceived latency.
  • Cache stable prefixes. Reuse the same system prompt and tool schemas across calls to bill repeated input at the low cache-hit rate.
  • Route by difficulty. Send simple, high-volume calls to Flash and reserve Pro for the hardest reasoning, all on one key.

Related

  • Authentication
  • Quickstart
  • Errors catalog

Table of Contents

Quick example
Authentication
Models
Endpoint
Request body
model — string, required
messages — array, required
max_tokens — integer, default 4096
stream — boolean, default false
temperature — number, default 1
top_p — number, default 1
frequency_penalty / presence_penalty — number, default 0
tools / tool_choice — optional
group — string, default "default"
Thinking mode
Vision input (beta)
Context caching
Response shape
Non-streaming (stream: false)
Streaming (stream: true)
Pricing
Limits
Errors
Recipes
Minimum request
Tool use (function calling)
Long-context analysis
When to pick Flash vs Pro
Tips
Related