ChatOpenAI
GPT-5.4
OpenAI's GPT-5.4 with 1M context and 128K max output — the cost-efficient GPT route.
The DeepSeek V4 API ships two open-weight models on one OpenAI-compatible endpoint — Flash for fast, low-cost everyday work and Pro for frontier reasoning, agentic coding, and STEM. Both bring a 1M-token context window, 384K max output, thinking mode on by default, vision input, tool use, and context caching. Pay-as-you-go in USD.
Real workflows powered by this model.

DeepSeek V4 Pro is the flagship of the DeepSeek V4 API — a 1.6T-parameter mixture-of-experts model (49B active) tuned for agentic coding, complex reasoning, and STEM. DeepSeek reports open-source state-of-the-art results on agentic coding benchmarks, and V4 is integrated with agent harnesses like Claude Code, OpenClaw, and OpenCode. Point a coding agent at the DeepSeek V4 API and it scopes the task, calls tools, and reasons through multi-step work in one run.
Read the API docs
DeepSeek V4 Flash is the fast lane of the DeepSeek V4 API — 284B parameters (13B active) whose reasoning closely approaches Pro at a fraction of the cost. Use the DeepSeek V4 API for in-IDE autocomplete, inline suggestions, CI-stage code review, bulk summarization, and chat backends. Context caching trims repeated system prompts and tool schemas to the low cache-hit rate, so agent loops and high-volume traffic stay cheap.

Both DeepSeek V4 API models default to a 1M-token context window — enough to load a whole mid-size repository, a long research pack, or a multi-turn agent trace in a single call. DeepSeek Sparse Attention keeps long-context inference efficient, so DeepSeek V4 API workloads like architecture review, dependency audits, and migration planning rarely need chunking.
Credit-based — 1 credit = $0.001 USD. Pay only for completed generations.
| Category | Unit | Price |
|---|---|---|
| DeepSeek V4 Flash | ||
| Input (cache miss) | 1M tokens | $0.14 |
| Input (cache hit) | 1M tokens | $0.0028 |
| Output | 1M tokens | $0.28 |
| DeepSeek V4 Pro | ||
| Input (cache miss) | 1M tokens | $1.74 |
| Input (cache hit) | 1M tokens | $0.0145 |
| Output | 1M tokens | $3.48 |
The DeepSeek V4 API speaks OpenAI Chat Completions verbatim. Moving an existing OpenAI integration to the DeepSeek V4 API is a base URL, an API key, and a model-string change — `deepseek-v4-flash` or `deepseek-v4-pro` — not a platform rewrite. The same `messages` array, the same streaming format, and a native Anthropic-style surface for SDK callers that prefer it.
The DeepSeek V4 API is open-weight and priced to match. Pro rivals top closed-source models on reasoning, math, and coding while costing a fraction of them per token; Flash drops the price by another order of magnitude for everyday traffic. Run premium work on Pro and route high-volume calls to Flash on the same key.
A single api.reapi.ai key unlocks the DeepSeek V4 API alongside GPT-5.5, Claude Opus 4.8, Gemini, and every other frontier chat model on the platform. Compare vendors, add fallbacks, and route traffic per call with a configuration change instead of an integration project.
The DeepSeek V4 API is a generational jump over V3.2 — a bigger context window, two model tiers, thinking on by default, vision, and agent-focused tuning. Here is what changed between the two.
Comparison reflects publicly documented behavior from DeepSeek's V4 release notes and model documentation at the time of writing. Some benchmark claims are vendor-reported. Model behavior and pricing can change; check the pricing card above and the API docs for current values.
Sign up at api.reapi.ai, open the console, generate an API key under API Keys, and top up tokens under Top Up. The chat workspace is separate from the reapi.ai image/video gateway — keys do not cross over.
OpenPOST https://api.reapi.ai/v1/chat/completions with `model` set to `deepseek-v4-flash` (or `deepseek-v4-pro`), your `messages` array, and `max_tokens`. The DeepSeek V4 API endpoint is OpenAI-compatible, including streamed responses; switch models with a one-line change.
OpenAcross the DeepSeek V4 API, reach for Flash on latency-sensitive, high-throughput work and Pro when a task needs deep reasoning. Reuse stable system prompts to hit the low cache-read rate, and toggle thinking mode off for the fastest, cheapest replies.
OpenCommon questions about this model.
Explore more models in the same category.
ChatOpenAI
OpenAI's GPT-5.4 with 1M context and 128K max output — the cost-efficient GPT route.
ChatAnthropic
Anthropic's Claude Opus 4.7 — 1M context, 128K output, premium coding and agent reasoning.
ChatAnthropic
Anthropic's Claude Sonnet 4.6 — balanced quality and speed for everyday production chat, code review, and mid-complexity agents.
ChatOpenAI
OpenAI's GPT-5.5 with 1M context and 128K max output, behind one OpenAI-compatible reAPI key.
curl https://api.reapi.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"group": "default",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": true,
"max_tokens": 4096,
"temperature": 0.7
}'Try it in the playground or grab an API key to integrate now.