What is the difference between DeepSeek V4 Flash and Pro?

Both are part of the DeepSeek V4 API and share a 1M context window, 384K max output, thinking mode, and tool use. Flash (284B / 13B active) is the fast, low-cost default for autocomplete, batch analysis, and chat backends; Pro (1.6T / 49B active) is the flagship for deep reasoning, complex debugging, and agentic coding. They share one DeepSeek V4 API key — mix them per request.

Does the DeepSeek V4 API use the standard chat-completions format?

Yes. The DeepSeek V4 API is a drop-in for OpenAI's /v1/chat/completions — same request shape, same `messages` array, same `stream` / `temperature` / `max_tokens` parameters, same SSE wire format. Most teams migrate by changing the base URL to https://api.reapi.ai/v1, swapping the API key, and setting `model` to `deepseek-v4-flash` or `deepseek-v4-pro`.

What is the DeepSeek V4 context window?

Both DeepSeek V4 API models default to a 1M-token context window and support up to 384K output tokens per response. DeepSeek Sparse Attention keeps long-context inference efficient, so you can feed entire repositories and long documents without chunking.

How does thinking mode work, and can I turn it off?

DeepSeek V4 runs in thinking mode by default: it produces a chain of thought before the final answer and returns it in a `reasoning_content` field alongside `content`. For latency-sensitive or simple calls you can switch to non-thinking mode for faster, cheaper responses — the model id stays the same.

Does the DeepSeek V4 API support vision and tool use?

Yes. The DeepSeek V4 API accepts image inputs (beta) alongside text in the same call, and supports function calling / tool use plus JSON output. It is tuned for agentic, multi-step workflows and integrates with leading coding-agent harnesses.

How is DeepSeek V4 different from DeepSeek V3.2?

The DeepSeek V4 API raises the default context window to 1M tokens (up from 128K), splits into two variants (Flash and Pro) instead of one, turns thinking on by default with a dual thinking / non-thinking mode, adds vision input, and ships dedicated agentic optimizations. Pro rivals top closed-source models on reasoning and coding while staying open-weight.

Where do I create an API key and buy tokens?

Both happen on api.reapi.ai — the chat workspace runs as its own platform separate from the image / video task gateway at reapi.ai. Sign up at api.reapi.ai, generate a key under API Keys, and top up under Top Up. A reapi.ai/settings/apikeys key will not authenticate against the chat endpoint.

DeepSeek V4 API — Flash & Pro, 1M Context

The DeepSeek V4 API ships two open-weight models on one unified endpoint — Flash for fast, low-cost everyday work and Pro for frontier reasoning, agentic coding, and STEM. Both bring a 1M-token context window, 384K max output, thinking mode on by default, vision input, tool use, and context caching. Pay-as-you-go in USD.

DeepSeek V4modeldeepseek-v4-flash

DeepSeek V4 playground

Open the chat playground to run DeepSeek V4 through the standard chat completions surface with your api.reapi.ai key.

Open chat playground

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

DeepSeek V4 Pro driving a long-horizon agentic coding session

Long-horizon agentic coding with DeepSeek V4 Pro

DeepSeek V4 Pro is the flagship of the DeepSeek V4 API — a 1.6T-parameter mixture-of-experts model (49B active) tuned for agentic coding, complex reasoning, and STEM. DeepSeek reports open-source state-of-the-art results on agentic coding benchmarks, and V4 is integrated with agent harnesses like Claude Code, OpenClaw, and OpenCode. Point a coding agent at the DeepSeek V4 API and it scopes the task, calls tools, and reasons through multi-step work in one run.

Read the API docs

DeepSeek V4 Flash powering high-throughput coding and batch jobs

High-throughput, low-cost work with DeepSeek V4 Flash

DeepSeek V4 Flash is the fast lane of the DeepSeek V4 API — 284B parameters (13B active) whose reasoning closely approaches Pro at a fraction of the cost. Use the DeepSeek V4 API for in-IDE autocomplete, inline suggestions, CI-stage code review, bulk summarization, and chat backends. Context caching trims repeated system prompts and tool schemas to the low cache-hit rate, so agent loops and high-volume traffic stay cheap.

DeepSeek V4 reasoning across a million-token analysis pack

Million-token codebase and document analysis

Both DeepSeek V4 API models default to a 1M-token context window — enough to load a whole mid-size repository, a long research pack, or a multi-turn agent trace in a single call. DeepSeek Sparse Attention keeps long-context inference efficient, so DeepSeek V4 API workloads like architecture review, dependency audits, and migration planning rarely need chunking.

Pricing

Credit-based — 1 credit = $0.001 USD. Pay only for completed generations.

Category	Unit	Price
DeepSeek V4 Flash
Input (cache miss)	1M tokens	$0.14
Input (cache hit)	1M tokens	$0.0028
Output	1M tokens	$0.28
DeepSeek V4 Pro
Input (cache miss)	1M tokens	$1.74
Input (cache hit)	1M tokens	$0.0145
Output	1M tokens	$3.48

Why reAPI

Drop-in access — plus an Anthropic surface

The DeepSeek V4 API speaks OpenAI Chat Completions verbatim. Moving an existing OpenAI integration to the DeepSeek V4 API is a base URL, an API key, and a model-string change — `deepseek-v4-flash` or `deepseek-v4-pro` — not a platform rewrite. The same `messages` array, the same streaming format, and a native Anthropic-style surface for SDK callers that prefer it.

Frontier reasoning at value pricing

The DeepSeek V4 API is open-weight and priced to match. Pro rivals top closed-source models on reasoning, math, and coding while costing a fraction of them per token; Flash drops the price by another order of magnitude for everyday traffic. Run premium work on Pro and route high-volume calls to Flash on the same key.

One key across DeepSeek, GPT, Claude, and Gemini

A single api.reapi.ai key unlocks the DeepSeek V4 API alongside GPT-5.5, Claude Opus 4.8, Gemini, and every other frontier chat model on the platform. Compare vendors, add fallbacks, and route traffic per call with a configuration change instead of an integration project.

DeepSeek V4 vs DeepSeek V3.2

The DeepSeek V4 API is a generational jump over V3.2 — a bigger context window, two model tiers, thinking on by default, vision, and agent-focused tuning. Here is what changed between the two.

Capability

DeepSeek V4 API on reAPI

DeepSeek V3.2

Model lineup

Two variants — Flash (284B / 13B active) and Pro (1.6T / 49B active) — on the same API key.

A single chat / reasoner model line.

Context window

1M tokens by default, with DeepSeek Sparse Attention for efficient long context.

128K-token context window.

Max output

Up to 384K output tokens per response.

Substantially smaller output cap.

Thinking mode

On by default, with a dual thinking / non-thinking switch and chain-of-thought in `reasoning_content`.

Reasoning available through a separate reasoner model.

Vision input

Image input supported (beta) on the same endpoint.

Text-only.

Agentic tuning

Dedicated agent optimizations; integrated with leading coding-agent harnesses; open-source SOTA on agentic coding per DeepSeek.

Capable general model without V4's agent-specific tuning.

Comparison reflects publicly documented behavior from DeepSeek's V4 release notes and model documentation at the time of writing. Some benchmark claims are vendor-reported. Model behavior and pricing can change; check the pricing card above and the API docs for current values.

Ship the DeepSeek V4 API in three steps

step 01
Create an account and key on api.reapi.ai
Sign up at api.reapi.ai, open the console, generate an API key under API Keys, and top up tokens under Top Up. The chat workspace is separate from the reapi.ai image/video gateway — keys do not cross over.
Open
step 02
Send your first request
POST https://api.reapi.ai/v1/chat/completions with `model` set to `deepseek-v4-flash` (or `deepseek-v4-pro`), your `messages` array, and `max_tokens`. The DeepSeek V4 API endpoint uses the standard chat-completions format, including streamed responses; switch models with a one-line change.
Open
step 03
Tune for cost and reasoning
Across the DeepSeek V4 API, reach for Flash on latency-sensitive, high-throughput work and Pro when a task needs deep reasoning. Reuse stable system prompts to hit the low cache-read rate, and toggle thinking mode off for the fastest, cheapest replies.
Open

Frequently asked questions

Common questions about this model.

The DeepSeek V4 API is billed pay-as-you-go in USD against your api.reapi.ai token balance. The pricing card on this page shows the live per-1M-token input and output rates for both Flash and Pro, plus the cache-hit rate. Cache hits are dramatically cheaper than re-sending the same tokens, and failed requests are not charged.

Related models

Explore more models in the same category.

View all models

MiniMax

MiniMax M3

From $0.600 per 1M tokens

Chat

OpenAI

GPT-5.4

From $1.00 per 1M tokens

Chat

Anthropic

Claude Opus 4.7

From $2.00 per 1M tokens

Chat

Anthropic

Claude Sonnet 4.6

From $2.00 per 1M tokens

Chat

View all models

start building

Ready to ship?

Try it in the playground or grab an API key to integrate now.

Get API key View API docs

DeepSeek V4 API — Flash & Pro, 1M Context

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

Long-horizon agentic coding with DeepSeek V4 Pro

Read the API docs

High-throughput, low-cost work with DeepSeek V4 Flash

Million-token codebase and document analysis

Category	Unit	Price
DeepSeek V4 Flash
Input (cache miss)	1M tokens	$0.14
Input (cache hit)	1M tokens	$0.0028
Output	1M tokens	$0.28
DeepSeek V4 Pro
Input (cache miss)	1M tokens	$1.74
Input (cache hit)	1M tokens	$0.0145
Output	1M tokens	$3.48