doubao-seedance-2.0
ByteDance Seedance 2.0 — async video generation. One endpoint, four variants, and implicit mode routing across text, image, first/last-frame, and reference-video / audio inputs.
ByteDance's async video model on reAPI. Four variants share one
endpoint and one parameter shape — pick the variant via model. Mode
is implicit: which media fields you set (prompt, image_urls,
image_with_roles, video_urls, audio_urls) decides whether the
request runs as text-to-video, image-to-video, first/last-frame
transition, or reference-driven generation. 4–15 second outputs at
480p / 720p / 1080p / 4k. See current pricing on the
model page.
Quick example
curl https://reapi.ai/api/v1/videos/generations \
-H "Authorization: Bearer rk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-2.0-face",
"prompt": "A kitten yawning at the camera, cinematic warm tones",
"resolution": "720p",
"size": "16:9",
"duration": 5
}'import requests
resp = requests.post(
"https://reapi.ai/api/v1/videos/generations",
headers={
"Authorization": "Bearer rk_live_xxx",
"Content-Type": "application/json",
},
json={
"model": "doubao-seedance-2.0-face",
"prompt": "A kitten yawning at the camera, cinematic warm tones",
"resolution": "720p",
"size": "16:9",
"duration": 5,
},
timeout=30,
)
print(resp.json())const r = await fetch("https://reapi.ai/api/v1/videos/generations", {
method: "POST",
headers: {
Authorization: "Bearer rk_live_xxx",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "doubao-seedance-2.0",
prompt: "A kitten yawning at the camera, cinematic warm tones",
resolution: "720p",
size: "16:9",
duration: 5,
}),
});
console.log(await r.json());package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "doubao-seedance-2.0-face",
"prompt": "A kitten yawning at the camera, cinematic warm tones",
"resolution": "720p",
"size": "16:9",
"duration": 5,
})
req, _ := http.NewRequest("POST",
"https://reapi.ai/api/v1/videos/generations", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer rk_live_xxx")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Submit response
{
"id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"model": "doubao-seedance-2.0-face",
"status": "processing",
"created_at": 1735000000
}Poll GET /api/v1/tasks/{id} (see the Tasks reference) until
status === "completed". The completed payload's output.video_urls
holds the generated MP4 URL, valid for 7 days. output.last_frame_url
is present when the request set return_last_frame: true.
Authentication
Every call needs a Bearer token. Generate keys at reapi.ai/settings/apikeys.
Authorization: Bearer YOUR_API_KEYKeys carry the active workspace's billing scope — there is no separate project header.
Endpoint
POST /api/v1/videos/generations
GET /api/v1/tasks/{id}Submission is async. The POST returns immediately with a task_id; the
task endpoint returns the same envelope until completion. Polling does
not consume credits.
Variants
doubao-seedance-2.0 is a family of two variants sharing one
parameter shape. Pick via model:
| Variant | Speed | 1080p / 4k | Real-person uploads |
|---|---|---|---|
doubao-seedance-2.0-face | standard | ✅ | ✅ |
doubao-seedance-2.0-fast-face | faster | ❌ (480p / 720p only) | ✅ |
reAPI never silently substitutes one variant for another. Sending
resolution: "1080p" to the Fast variant returns 400, never an
auto-downgraded clip.
Real-person uploads — both variants accept real-person source images / videos.
Channels
The variants above ship across two channels — same async endpoint,
selected by the model id you send:
| Channel | Model ids | Notes |
|---|---|---|
| Standard | doubao-seedance-2.0-face, doubao-seedance-2.0-fast-face | Face variants accept real-person inputs. |
| Official | doubao-seedance-2.0-official, doubao-seedance-2.0-fast-official | Official direct channel — lower price. Real-person inputs are not accepted. |
Standard and Official share the parameter shape documented below.
Mode routing
doubao-seedance-2.0 picks its mode from which media fields you set —
there is no mode parameter:
| Fields you send | Mode | What it does |
|---|---|---|
prompt only | T2V | Generate from text |
prompt + image_urls (1–9) | I2V | Animate / extend from reference images |
prompt + image_with_roles (1–2 frames) | FRAMES | First / last frame transition |
prompt + video_urls and / or audio_urls (+ optional image_urls) | REF | Reference-driven, optionally multi-modal |
Mutex rules. Most field combinations are illegal. The single legal
multi-field shape is image_urls + video_urls + audio_urls (REF mode,
multi-modal). Any other combination is rejected with 400 (code
20003).
promptis required on every request (all four modes carry it)image_urls⊕image_with_roles— never togetherimage_with_rolescannot be combined withvideo_urlsoraudio_urlsaudio_urlsrequiresimage_urlsorvideo_urls
Request body
model — required
string. One of the four variants in the table above.
prompt — string, required
Required on every request, min 3 characters, up to 4,000 (≤ 500 recommended — quality drops past ~500 chars on the upstream model). Applies to all modes — T2V, I2V, FRAMES, REF.
Best results come from naming, in order, the subject, the
action, the camera move, and the style. e.g. "A kitten, yawning into the camera, slow push-in, cinematic warm tones".
Failure modes.
- Missing / empty →
400(code20002). - Shorter than 3 chars →
400(code20003). - Longer than 4,000 chars →
400(code20003).
duration — integer, default 5
Output length in seconds. Any integer in [4, 15]. Out-of-range →
400.
Billable seconds = sum(video_urls clip lengths) + duration. The
input reference clips and the generated output both contribute to cost.
Image and audio references don't carry a billable time component —
only video_urls adds. reAPI probes video_urls server-side via
ffmpeg metadata; the value reported by your client is never trusted
for billing. See Pricing for the full formula.
size — string, default "adaptive"
Output ratio. One of:
| Value | Shape |
|---|---|
16:9 | Landscape |
9:16 | Portrait |
1:1 | Square |
4:3 | Traditional landscape |
3:4 | Traditional portrait |
21:9 | Cinematic ultrawide |
adaptive | Match the input image / video's ratio |
Invalid values → 400 (no silent fallback).
resolution — string, default "720p"
480p / 720p / 1080p / 4k — lowercase only. Drives pricing.
Uppercase forms like 1080P are rejected with 400.
1080p and 4k are variant-gated. Only doubao-seedance-2.0 and
doubao-seedance-2.0-face accept 1080p / 4k. The Fast variants
(-fast, -fast-face) cap at 720p — sending a higher resolution returns
400 resolution=<value> is not supported by <variant> (code 20003),
no auto-downgrade.
generate_audio — boolean, default true
When true (the default), the model synthesizes an audio track that plays
alongside the generated video; pass false to get a silent clip. Independent
of audio_urls (which is a reference for the model to align with — not a
synthesis toggle).
return_last_frame — boolean, default false
When true, the completed task carries an extra output.last_frame_url
holding the final frame as a still image. Pass it as image_urls of
the next request to chain continuous video without prompt drift.
tools — object[]
Per-tool capability list. Today only one type is recognized:
"tools": [{ "type": "web_search" }]web_search lets the model query the web during generation — useful
for current events or named brands. Unknown type values are rejected
with 400 tools[i].type must be "web_search".
nsfw_checker — boolean, default true
Safety checking is enabled by default. Direct API callers can pass
"nsfw_checker": false on the Standard model ids in this page:
{
"model": "doubao-seedance-2.0-face",
"prompt": "Your prompt",
"resolution": "720p",
"size": "16:9",
"duration": 5,
"nsfw_checker": false
}When set to false, reAPI sends the task directly through the Flexible
channel when that channel is available and compatible with the request, and
does not attach fallback to that task. If no compatible Flexible channel is
available, reAPI silently uses the selected Standard channel instead; Standard
channel generation remains safety-checked. In both cases, no fallback is
attached for that request.
image_urls — string[]
Array of public HTTP(S) URLs. Up to 9 entries. Triggers I2V (when
sent without other media fields) or augments REF (when combined with
video_urls / audio_urls).
Mutually exclusive with image_with_roles. Sending more than 9 is
rejected with 400 at most 9 image_urls allowed.
No data: URIs. reAPI rejects base64 inputs platform-wide — every
URL field on this endpoint must be a public HTTP(S) URL. Upload to
your own object storage (S3, R2, OSS, …) and pass the URL.
image_with_roles — object[]
First / last frame interpolation. Each entry is a {url, role} object:
"image_with_roles": [
{ "url": "https://your-cdn.com/day.jpg", "role": "first_frame" },
{ "url": "https://your-cdn.com/night.jpg", "role": "last_frame" }
]role is one of first_frame / last_frame. Up to 9 entries
(typical use: 1 or 2 — one first frame and one last frame).
Cannot be combined with image_urls, video_urls, or
audio_urls — these modes are exclusive.
video_urls — string[]
Reference video clips for REF mode. Up to 3 entries; each clip 2–15 s long, combined ≤ 15 s. Each clip's frame must be 300–6000 px on each side, 0.41–8.3 MP total, aspect ratio 0.4–2.5. Public HTTP(S) URLs only.
No real people on standard / fast variants. Use the Face variants
(-face, -fast-face) when the reference clip features identifiable
real people — the non-Face variants reject them upstream.
reAPI probes each clip's resolution and duration server-side via
ffmpeg metadata. Out-of-spec assets surface as a 400:
- Frame outside 300–6000 px/side, 0.41–8.3 MP, or aspect 0.4–2.5 →
400 video_urls[i] resolution WxH is out of range(code20003) - Each clip outside 2–15s, or combined > 15s →
400 video_urls total duration X.XXs exceeds the 15s limit(code20003) - Probe failure (network / format) →
400 Could not determine source video duration for billing(code30002) — no charge
Mutually exclusive with image_with_roles.
audio_urls — string[]
Reference audio for REF mode. Up to 3 entries; combined duration ≤ 15 seconds. Public HTTP(S) URLs only.
Must accompany image_urls OR video_urls — a request with
audio_urls and no visual reference is rejected with 400 audio_urls must be used together with image_urls or video_urls.
Mutually exclusive with image_with_roles.
Response envelope
Submit and poll share the same shape — only status and output fill
in over time.
{
"id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"model": "doubao-seedance-2.0-face",
"status": "completed",
"created_at": 1735000000,
"output": {
"video_urls": ["https://cdn.reapi.ai/media/tasks/018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e/0.mp4"],
"last_frame_url": "https://cdn.reapi.ai/media/tasks/018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e/0.png"
},
"error": null
}| Field | Type | Notes |
|---|---|---|
id | string | Task identifier — keep it for polling and audit |
model | string | Echo of the submitted model (the variant you picked) |
status | string | processing / completed / failed |
created_at | integer | Submission unix timestamp |
output | object | null | null until completion |
output.video_urls | string[] | Generated MP4 URL(s) — valid for 7 days |
output.last_frame_url | string | null | Present only when the request set return_last_frame: true |
error | object | null | Populated on failed — { code, message } |
Validation errors
All cases below return HTTP 400 with the noted code. Pattern-match on
code, not message — message strings carry request-specific context
(field names, observed values) and are not a stable contract.
| Trigger | Code | Message |
|---|---|---|
Missing prompt | 20002 | prompt: Invalid input: expected string, received undefined |
prompt shorter than 3 chars | 20003 | prompt: Too small: expected string to have >=3 characters |
prompt longer than 4,000 chars | 20003 | prompt: Too big: expected string to have <=4000 characters |
image_urls and image_with_roles together | 20003 | image_urls and image_with_roles cannot be used simultaneously |
image_with_roles + video_urls or audio_urls | 20003 | image_with_roles cannot be combined with video_urls or audio_urls |
audio_urls without visual reference | 20003 | audio_urls must be used together with image_urls or video_urls |
image_urls > 9 | 20003 | at most 9 image_urls allowed, got N |
image_with_roles > 9 | 20003 | at most 9 image_with_roles allowed, got N |
image_with_roles[i].role invalid | 20003 | image_with_roles[i].role must be first_frame or last_frame, got "..." |
video_urls > 3 | 20003 | at most 3 video_urls allowed, got N |
video_urls clip frame out of range (300–6000px / 0.41–8.3MP / aspect 0.4–2.5) | 20003 | video_urls[i] resolution WxH is out of range |
video_urls combined > 15s | 20003 | video_urls total duration X.XXs exceeds the 15s limit |
audio_urls > 3 | 20003 | at most 3 audio_urls allowed, got N |
audio_urls combined > 15s | 20003 | audio_urls total duration X.XXs exceeds the 15s limit |
duration outside 4–15 | 20003 | duration must be 4-15 seconds, got N |
Invalid size value | 20005 | invalid size "..." (allowed: 16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / adaptive) |
Invalid resolution value | 20003 | invalid resolution "..." (allowed: 480p / 720p / 1080p / 4k) |
1080p on Fast variant | 20003 | resolution=1080p is not supported by <variant> (use doubao-seedance-2.0 or doubao-seedance-2.0-face) |
tools[i].type not web_search | 20003 | tools[i].type must be "web_search", got "..." |
Any URL field carrying a data: URI | 20003 | <field> entries must be public URLs; base64 data URIs are not supported |
| Reference video probe fails | 30002 | Could not determine source video duration for billing: ... |
The full envelope is { "error": { "code", "message", "request_id" } } —
see Errors catalog for wire format and request_id
correlation.
Recipes
T2V — text-to-video
{
"model": "doubao-seedance-2.0-face",
"prompt": "A kitten yawning at the camera, slow push-in, warm tones",
"resolution": "720p",
"size": "16:9",
"duration": 5,
"fallback": { "enabled": false }
}I2V — single reference image
{
"model": "doubao-seedance-2.0-face",
"prompt": "The kitten stands up and walks toward the camera",
"image_urls": ["https://your-cdn.com/cat.jpg"],
"duration": 5
}FRAMES — first / last frame transition
{
"model": "doubao-seedance-2.0-face",
"prompt": "Smooth transition from day to night",
"image_with_roles": [
{ "url": "https://your-cdn.com/day.jpg", "role": "first_frame" },
{ "url": "https://your-cdn.com/night.jpg", "role": "last_frame" }
],
"duration": 5
}REF — reference video (style transfer)
{
"model": "doubao-seedance-2.0-face",
"prompt": "Restylize the reference clip into anime aesthetics",
"video_urls": ["https://your-cdn.com/reference.mp4"]
}REF — reference video + reference audio
{
"model": "doubao-seedance-2.0-face",
"prompt": "A scene of a person speaking",
"video_urls": ["https://your-cdn.com/reference.mp4"],
"audio_urls": ["https://your-cdn.com/speech.wav"],
"size": "16:9",
"duration": 11
}Voiced video (synthesized audio)
{
"model": "doubao-seedance-2.0-face",
"prompt": "A man calls out to a woman: \"Remember — never point at the moon with your finger.\"",
"generate_audio": true
}Continuous video chain
Step 1 — produce a 5s clip and ask for the last-frame URL:
{
"model": "doubao-seedance-2.0-face",
"prompt": "The kitten approaches the camera",
"image_urls": ["https://your-cdn.com/kitten-start.png"],
"return_last_frame": true
}Step 2 — feed output.last_frame_url as image_urls of the next call:
{
"model": "doubao-seedance-2.0-face",
"prompt": "The kitten turns and walks away",
"image_urls": ["<paste output.last_frame_url from step 1>"]
}Fast variant — quick timelapse
{
"model": "doubao-seedance-2.0-fast-face",
"prompt": "City nightscape timelapse",
"size": "21:9",
"duration": 8
}Multi-modal — images + reference video + reference audio
The full REF surface — combine all three reference types for tightly directed product / brand spots.
{
"model": "doubao-seedance-2.0-face",
"prompt": "First-person POV product ad with dynamic camera moves",
"image_urls": [
"https://your-cdn.com/product-1.jpg",
"https://your-cdn.com/product-2.jpg"
],
"video_urls": ["https://your-cdn.com/style-ref.mp4"],
"audio_urls": ["https://your-cdn.com/bgm.mp3"],
"generate_audio": true,
"size": "16:9",
"duration": 11
}Choosing a variant
| Need | Pick |
|---|---|
| Highest quality, full resolution range | doubao-seedance-2.0-face |
| Cheaper / faster, 720p ceiling | doubao-seedance-2.0-fast-face |
Variants are independent products — reAPI never rewrites your selected
model on the primary attempt. Standard variants can still use the
fallback policy above after an eligible generation-side failure.
Polling pattern
The task endpoint behaves identically to image tasks — only the
completed output shape differs (video_urls / last_frame_url
instead of image_urls). A pragmatic schedule:
0–5 minutes: poll every 5s
5 min – 1 h: back off gradually toward 1 min
≥ 1 h: cap at 3 min between pollsA typical task completes in a few minutes. A single generation attempt can run for up to 48 hours; when fallback is enabled, the overall task window can cover two generation attempts.
Pricing
Per-second × billable seconds, where:
billable_seconds = sum(video_urls clip lengths, server-probed)
+ durationvideo_urls clip lengths are measured server-side via ffmpeg metadata
— client-stated values are never trusted for billing. Image and
audio references don't add to billable time. T2V / I2V / FRAMES
requests (no video_urls) bill on duration alone.
The per-second rate depends on three axes:
- Variant (2 options)
- Resolution (
480p/720p/1080p/4k) - Mode —
text(no media references) vs.ref(any ofimage_urls,image_with_roles,video_urls,audio_urlsis set)
REF rates are lower than text rates at every cell. See live numbers on the model page — that table is dynamic and always reflects the current rate.
Bill formula (1 credit = $0.001):
credits = ceil(per_second_usd × billable_seconds × 1000)Charge on submit; refund automatically on failed. Probe failures
(unreachable / unreadable video_urls) return 400 PRICING_UNAVAILABLE
with no charge.
When fallback is enabled, reAPI reserves the larger of the primary and fallback attempt prices. The final successful task is settled to the winning attempt's price and the difference is refunded automatically. If both attempts fail, the full reserve is refunded.
Worked example. doubao-seedance-2.0 at 720p, REF mode, with a
5-second reference video and duration: 6:
billable_seconds = 5 + 6 = 11credits = ceil(per_second_usd × 11 × 1000)
The same duration: 6 request without video_urls would bill 6
seconds at the (higher) text rate.
Tips
- Prompt motion, not just scene. "Slow push-in, warm tones, shallow depth of field" outperforms a noun-list of what's on screen.
- Sweet-spot duration: 5–10 seconds. Below 5s motion looks choppy; above 10s generation time grows fast.
- Trim reference clips before upload. Both their actual length
AND your
durationcount toward the bill. A 2-second style snippet is usually enough to convey style — there's no quality bonus for uploading a 15s reference. - Pick
doubao-seedance-2.0-fastfor iteration. Fast variants cost noticeably less and miss only the 1080p tier — perfect for prompt-tuning loops where final quality comes later. - Real people → Face variants. The non-Face variants reject
identifiable real-person assets during generation; switching is
a one-character change to
model. - Chain continuous video with
return_last_frame. Pass the returned URL asimage_urlsof the next request. No prompt drift between segments.