rreAPI
  • Models
  • Chat
  • Blog
  • Docs
  • Changelog
Home/Models/DeepSeek V4new

DeepSeek V4 API — Flash & Pro, 1M Context

The DeepSeek V4 API ships two open-weight models on one OpenAI-compatible endpoint — Flash for fast, low-cost everyday work and Pro for frontier reasoning, agentic coding, and STEM. Both bring a 1M-token context window, 384K max output, thinking mode on by default, vision input, tool use, and context caching. Pay-as-you-go in USD.

Price
est$0.14– $3.48per 1M tokens
Get API keyView API docs

What you can build

Real workflows powered by this model.

DeepSeek V4 Pro driving a long-horizon agentic coding session

Long-horizon agentic coding with DeepSeek V4 Pro

DeepSeek V4 Pro is the flagship of the DeepSeek V4 API — a 1.6T-parameter mixture-of-experts model (49B active) tuned for agentic coding, complex reasoning, and STEM. DeepSeek reports open-source state-of-the-art results on agentic coding benchmarks, and V4 is integrated with agent harnesses like Claude Code, OpenClaw, and OpenCode. Point a coding agent at the DeepSeek V4 API and it scopes the task, calls tools, and reasons through multi-step work in one run.

Read the API docs
DeepSeek V4 Flash powering high-throughput coding and batch jobs

High-throughput, low-cost work with DeepSeek V4 Flash

DeepSeek V4 Flash is the fast lane of the DeepSeek V4 API — 284B parameters (13B active) whose reasoning closely approaches Pro at a fraction of the cost. Use the DeepSeek V4 API for in-IDE autocomplete, inline suggestions, CI-stage code review, bulk summarization, and chat backends. Context caching trims repeated system prompts and tool schemas to the low cache-hit rate, so agent loops and high-volume traffic stay cheap.

DeepSeek V4 reasoning across a million-token analysis pack

Million-token codebase and document analysis

Both DeepSeek V4 API models default to a 1M-token context window — enough to load a whole mid-size repository, a long research pack, or a multi-turn agent trace in a single call. DeepSeek Sparse Attention keeps long-context inference efficient, so DeepSeek V4 API workloads like architecture review, dependency audits, and migration planning rarely need chunking.

Pricing

Credit-based — 1 credit = $0.001 USD. Pay only for completed generations.

CategoryUnitPrice
DeepSeek V4 Flash
Input (cache miss)1M tokens
$0.14
Input (cache hit)1M tokens
$0.0028
Output1M tokens
$0.28
DeepSeek V4 Pro
Input (cache miss)1M tokens
$1.74
Input (cache hit)1M tokens
$0.0145
Output1M tokens
$3.48

Why reAPI

OpenAI-compatible drop-in — plus an Anthropic surface

The DeepSeek V4 API speaks OpenAI Chat Completions verbatim. Moving an existing OpenAI integration to the DeepSeek V4 API is a base URL, an API key, and a model-string change — `deepseek-v4-flash` or `deepseek-v4-pro` — not a platform rewrite. The same `messages` array, the same streaming format, and a native Anthropic-style surface for SDK callers that prefer it.

Frontier reasoning at value pricing

The DeepSeek V4 API is open-weight and priced to match. Pro rivals top closed-source models on reasoning, math, and coding while costing a fraction of them per token; Flash drops the price by another order of magnitude for everyday traffic. Run premium work on Pro and route high-volume calls to Flash on the same key.

One key across DeepSeek, GPT, Claude, and Gemini

A single api.reapi.ai key unlocks the DeepSeek V4 API alongside GPT-5.5, Claude Opus 4.8, Gemini, and every other frontier chat model on the platform. Compare vendors, add fallbacks, and route traffic per call with a configuration change instead of an integration project.

DeepSeek V4 vs DeepSeek V3.2

The DeepSeek V4 API is a generational jump over V3.2 — a bigger context window, two model tiers, thinking on by default, vision, and agent-focused tuning. Here is what changed between the two.

Capability
DeepSeek V4 API on reAPI
DeepSeek V3.2
Model lineup
Two variants — Flash (284B / 13B active) and Pro (1.6T / 49B active) — on the same API key.
A single chat / reasoner model line.
Context window
1M tokens by default, with DeepSeek Sparse Attention for efficient long context.
128K-token context window.
Max output
Up to 384K output tokens per response.
Substantially smaller output cap.
Thinking mode
On by default, with a dual thinking / non-thinking switch and chain-of-thought in `reasoning_content`.
Reasoning available through a separate reasoner model.
Vision input
Image input supported (beta) on the same endpoint.
Text-only.
Agentic tuning
Dedicated agent optimizations; integrated with leading coding-agent harnesses; open-source SOTA on agentic coding per DeepSeek.
Capable general model without V4's agent-specific tuning.

Comparison reflects publicly documented behavior from DeepSeek's V4 release notes and model documentation at the time of writing. Some benchmark claims are vendor-reported. Model behavior and pricing can change; check the pricing card above and the API docs for current values.

Ship the DeepSeek V4 API in three steps

  1. 01
    step 01

    Create an account and key on api.reapi.ai

    Sign up at api.reapi.ai, open the console, generate an API key under API Keys, and top up tokens under Top Up. The chat workspace is separate from the reapi.ai image/video gateway — keys do not cross over.

    Open
  2. 02
    step 02

    Send your first request

    POST https://api.reapi.ai/v1/chat/completions with `model` set to `deepseek-v4-flash` (or `deepseek-v4-pro`), your `messages` array, and `max_tokens`. The DeepSeek V4 API endpoint is OpenAI-compatible, including streamed responses; switch models with a one-line change.

    Open
  3. 03
    step 03

    Tune for cost and reasoning

    Across the DeepSeek V4 API, reach for Flash on latency-sensitive, high-throughput work and Pro when a task needs deep reasoning. Reuse stable system prompts to hit the low cache-read rate, and toggle thinking mode off for the fastest, cheapest replies.

    Open

Frequently asked questions

Common questions about this model.

The DeepSeek V4 API is billed pay-as-you-go in USD against your api.reapi.ai token balance. The pricing card on this page shows the live per-1M-token input and output rates for both Flash and Pro, plus the cache-hit rate. Cache hits are dramatically cheaper than re-sending the same tokens, and failed requests are not charged.

Related models

Explore more models in the same category.

View all models
GPT-5.4 coverChat

OpenAI

GPT-5.4

OpenAI's GPT-5.4 with 1M context and 128K max output — the cost-efficient GPT route.

From $1.00 per 1M tokens
Claude Opus 4.7 coverChat

Anthropic

Claude Opus 4.7

Anthropic's Claude Opus 4.7 — 1M context, 128K output, premium coding and agent reasoning.

From $2.00 per 1M tokens
Claude Sonnet 4.6 coverChat

Anthropic

Claude Sonnet 4.6

Anthropic's Claude Sonnet 4.6 — balanced quality and speed for everyday production chat, code review, and mid-complexity agents.

From $2.00 per 1M tokens
GPT-5.5 coverChat

OpenAI

GPT-5.5

OpenAI's GPT-5.5 with 1M context and 128K max output, behind one OpenAI-compatible reAPI key.

From $2.00 per 1M tokens
View all models
docs/api/deepseek-v4

API reference

Drop-in code and the full parameter table.

View full reference
curl https://api.reapi.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "group": "default",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "stream": true,
    "max_tokens": 4096,
    "temperature": 0.7
  }'
start building

Ready to ship?

Try it in the playground or grab an API key to integrate now.

Get API keyView API docs
rreAPI

reAPI is the AI API aggregator with sub-second failover, zero request logging, and one OpenAI-compatible endpoint for every top model.

GitHubX (Twitter)
Built withLogo of reAPIreAPI
Featured on There's An AI For ThatFeatured on Findly.toolsFazier badgeDang.ai
ai tools code.market
Featured on Twelve Tools
Image
  • GPT Image 2
  • Gemini 3 Pro Image
  • Gemini 3.1 Flash Image
  • Gemini 2.5 Flash Image
  • Seedream 5.0 Lite
  • Imagen 4.0
  • Wan 2.7 Image
Video
  • Seedance 2.0
  • Happy Horse 1.0
  • Vidu Q3
  • Pixverse v6
  • Grok Imagine 1.0
  • VEO 3.1
  • Gemini Omni
  • Wan 2.7 Video
  • Kling Motion Control
LLM
  • Claude Opus 4.8
  • Claude Opus 4.7
  • Claude Sonnet 4.6
  • DeepSeek V4
  • GPT-5.4
  • GPT-5.5
Audio
  • Mureka V9
  • Vocal Remover
  • Music Extractor
  • Voice Cleaner
  • Multistem Splitter
  • Voice Changer
Text
  • AI Humanizer
  • AI Text Detector
Tools
  • Enhance Video 1.0
·······
© 2026 reAPI. All Rights Reserved.[email protected]
AboutContactChangelogCookie PolicyPrivacy PolicyTerms of Service