Seedance 2.0 Character Consistency: References, Voice, Shots

Character consistency is the reason Seedance 2.0 exists in its current shape: the model takes up to 9 images, 3 video clips, and 3 audio tracks as references in a single generation^[1], and ByteDance's own prompt guide documents a syntax for pinning a subject to specific reference images^[2]. Most of the frustration I see in community threads ("same character came back different," "it ignored my second reference") traces to rules that are actually written down and almost never read.

This guide is those rules, assembled from ByteDance's official prompt documentation, the API reference, and Dreamina's own tutorials, plus the honest boundaries: what has no official mechanism (cross-request character locks, seeds) and what belongs to which platform (the lipsync question).

TL;DR

Seedance 2.0's input budget is 9 images + 3 videos + 3 audio clips, with videos and audio each capped at 15 seconds total; audio cannot be sent without other content^[1].
The official binding syntax exists: define your subject from a numbered image ("subject@image 1"), and put the most important references first, because order carries weight^[2].
ByteDance recommends 4–5 references, not the maximum, one headshot plus one full-body shot per character, and advises against multi-angle sets of the same face^[2].
There is no seed parameter and no persistent character ID. Cross-video consistency comes from reusing the identical reference set, and from return_last_frame continuation^[1]^[3].
Voice guidance is real, lipsync wording is platform-specific: the API docs describe audio references steering the performance^[1]; the explicit "lip-sync" promise appears in Dreamina's documentation, not the API's^[4].
Multi-shot has an official pattern: number your shots ("Shot 1… Shot 2… Shot 3…") and treat exact per-shot timings as unreliable by ByteDance's own warning^[2].

Seedance 2.0's 12-slot reference budget, precisely

Seedance 2.0 accepts references through a content array where each item declares a role. The documented limits: up to 9 reference images (JPEG/PNG/WebP among others, aspect ratio between 0.4 and 2.5, 300 to 6,000 pixels a side, under 30MB each), up to 3 reference videos totaling no more than 15 seconds, and up to 3 audio references also totaling 15 seconds^[1]. Audio is a modifier, not a subject: the API rejects requests that send audio with nothing else^[1].

One structural rule catches nearly everyone: first-frame mode and multimodal reference mode are mutually exclusive request shapes^[1]. You either hand the model an exact opening frame to animate, or a pile of references to compose from; mixing the two mental models in one request is the fastest route to "it ignored my image."

And the rule that generates the most confusing rejections: real human faces cannot be uploaded directly as references. That restriction and its sanctioned consent-based routes are their own topic, covered in our not-eligible guide.

The official syntax for locking a character

ByteDance's prompt guide is unusually concrete about binding. The canonical pattern defines each subject from a numbered image, with a shorthand the docs render as "subject@image 1"^[2]; the launch blog uses the same @-style reference in its own example prompts^[5], and Dreamina's tutorial mirrors it as @AssetName inside its editor^[6]. The point of the syntax is disambiguation: with several references in play, the prompt says explicitly which image owns the face, which owns the outfit, which owns the location.

Around that syntax, the guide's four working rules^[2]:

Use 4–5 references, not 12. The docs recommend against maxing the slots; every extra asset dilutes attention.
Per character: one headshot plus one full-body image. That pair anchors identity better than a stack of angles.
Skip multi-view sets of the same person. ByteDance explicitly advises against them; they invite identity drift rather than preventing it.
Order by importance. Earlier in the prompt means more influence. Put the character before the location, the location before the mood board.

That is the whole trick most "consistency hack" videos re-sell: define subjects explicitly, feed fewer and better references, order them deliberately.

The same character across many videos

Here is the boundary the documentation draws and no tutorial should blur: Seedance 2.0 has no persistent character ID, no seed input, and no cross-request memory. Neither ByteDance's parameter reference nor reAPI's API surface exposes a seed at all^[1]^[3], so the "does Seedance use seed numbers" question has a clean answer: no, determinism control is reference-driven, not seed-driven.

What works instead, in production order:

Freeze the reference set. Same character images, same order, same binding phrases, request after request. Seedance 2.0's consistency comes from identical inputs, so store the set alongside your prompts and treat any change as a new character version.

Chain with return_last_frame. The API can hand back the final frame of a generation^[1]; feed it forward as the first frame of the next clip and you get scene-to-scene continuity with zero identity re-rolling at the joins.

Let generated faces re-enter legally. Content the platform generated for your account within the past 30 days is trusted as input even when it contains faces^[7], which is what makes iterative character work possible at all under the face rules.

For scale, ByteDance's technical report ranks Seedance 2.0 first on subject-consistency evaluation among peers^[5]; the machinery above is how that capability actually gets exercised across a series rather than a single clip.

Voice, music, and the lipsync question

Seedance 2.0's audio references steer three things: the sound of the output, the timing of the performance, and the voice character. The API accepts up to three clips totaling 15 seconds in the reference_audio role^[1], and the practical workflow for "make my character perform this song" is exactly what it sounds like: attach the track (a public URL on API platforms), bind your character from images, and prompt the performance.

On lipsync specifically, the sources split and it is worth being precise. ByteDance's API documentation describes joint audio-video generation and audio-guided output; the words "lip-sync" do not appear. Dreamina's product documentation is the surface that promises it, stating that a voice sample guides voice character and "aligns lip-sync, pacing, expressions"^[4]. In practice the same model family powers both, but if your pipeline contractually needs mouth-accurate sync, test on your own material rather than citing a docs page, because the API-side wording deliberately promises less.

Voice consistency across clips follows the same logic as visual consistency: reuse the same voice sample in the same slot every time. There is no voice ID to pin, so the sample is the ID.

Multi-shot prompts that survive generation

The official pattern for multiple shots in one clip is numbered storyboard prose: "Shot 1: … Shot 2: … Shot 3: …"^[2]. Two warnings straight from the guide: exact per-shot second timings are not reliably honored, so write sequence and emphasis rather than timecodes^[2]; and shot count multiplies complexity, so the fewer-references rule matters double in multi-shot prompts.

For anything longer than one generation can hold (Seedance 2.0 caps at 15 seconds^[1]), the chain is storyboard → per-clip prompts with the frozen reference set → return_last_frame joins. That is also the workflow that Seedance 2.5's announced 30-second single-pass generation and segment-level prompt control are designed to collapse; our Seedance 2.5 pre-launch guide tracks what is confirmed there.

If you want to run all of this over an API, the whole reference surface (images, videos, audio, first frame, last-frame return) is exposed on reAPI's Seedance 2.0 with per-second billing, and reference-mode requests bill lower than pure text-to-video^[3].

FAQ

How many reference images can Seedance 2.0 take?

Up to 9 images, plus 3 videos (15s total) and 3 audio clips (15s total) in one request^[1]. ByteDance's own guidance is to use 4–5 well-chosen assets rather than the maximum^[2].

How do I keep the same person across multiple Seedance 2.0 videos?

Freeze one reference set (headshot + full body), bind it with the subject@image syntax, reuse it identically in every request, and chain scenes with return_last_frame^[1]^[2]. There is no character-lock parameter; the reference set is the lock.

Does Seedance 2.0 use seed number inputs?

No. No seed parameter exists in ByteDance's documented request schema or on reAPI's surface^[1]^[3]. Repeatability comes from identical references and prompts, and it is soft repeatability, not bitwise.

Can Seedance 2.0 lipsync to a Suno song?

Attach the track as an audio reference and prompt the vocal performance; the API documents audio-guided generation^[1], while the explicit lip-sync alignment claim comes from Dreamina's docs^[4]. For release-quality sync, validate on your own footage.

How many audio clips can Seedance 2.0 take?

Three, totaling no more than 15 seconds, and never alone; audio must accompany other content in the request^[1].

How do I prompt multiple shots in one Seedance 2.0 clip?

Numbered storyboard style, "Shot 1 / Shot 2 / Shot 3," per the official guide, which also warns that exact per-shot durations are approximate^[2].

Why did my character's face get rejected?

Real-person faces cannot be uploaded directly as references; that is a model-level rule with documented consent-based exceptions^[7]. Full breakdown in our not-eligible guide.

The consistency playbook on one line

Bind subjects explicitly, feed 4–5 deliberate references with the character first, freeze that set for the whole series, chain clips through the last frame, and reuse the same voice sample when sound matters. Everything above comes from ByteDance's own documentation rather than folklore, and every piece of it runs over one endpoint on reAPI. That is Seedance 2.0 character consistency without the mystery: fewer, better references, used the same way every time.

References

Volcano Engine / BytePlus (ByteDance). Seedance 2.0 API reference — content roles, reference limits, first-frame exclusivity, return_last_frame. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/1520757
Volcano Engine / BytePlus (ByteDance). Seedance 2.0 official prompt guide — subject binding syntax, reference count and ordering, multi-shot pattern. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/2222480
reAPI. Seedance 2.0 — API documentation and model page. Retrieved July 2026 from reapi.ai/docs/seedance-2-0
Dreamina (CapCut). Seedance 2.0 tool page — voice sample and lip-sync alignment. Retrieved July 2026 from dreamina.capcut.com/tools/seedance-2-0
ByteDance Seed. Seedance 2.0 launch blog and technical report (arXiv:2604.14148). Retrieved July 2026 from seed.bytedance.com/en/blog/seedance-2-0-official-launch
Dreamina (CapCut). Seedance 2.0 prompt tutorial — @AssetName references and Multiframes. Retrieved July 2026 from dreamina.capcut.com/resource/seedance-2-0-prompt
Volcano Engine / BytePlus (ByteDance). Trusted-input exemptions for generated content and consent-verified assets. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/2291680

TL;DR

Seedance 2.0's input budget is 9 images + 3 videos + 3 audio clips, with videos and audio each capped at 15 seconds total; audio cannot be sent without other content^[1].
The official binding syntax exists: define your subject from a numbered image ("subject@image 1"), and put the most important references first, because order carries weight^[2].
ByteDance recommends 4–5 references, not the maximum, one headshot plus one full-body shot per character, and advises against multi-angle sets of the same face^[2].
There is no seed parameter and no persistent character ID. Cross-video consistency comes from reusing the identical reference set, and from return_last_frame continuation^[1]^[3].
Voice guidance is real, lipsync wording is platform-specific: the API docs describe audio references steering the performance^[1]; the explicit "lip-sync" promise appears in Dreamina's documentation, not the API's^[4].
Multi-shot has an official pattern: number your shots ("Shot 1… Shot 2… Shot 3…") and treat exact per-shot timings as unreliable by ByteDance's own warning^[2].

Seedance 2.0's 12-slot reference budget, precisely

The official syntax for locking a character

Around that syntax, the guide's four working rules^[2]:

Use 4–5 references, not 12. The docs recommend against maxing the slots; every extra asset dilutes attention.
Per character: one headshot plus one full-body image. That pair anchors identity better than a stack of angles.
Skip multi-view sets of the same person. ByteDance explicitly advises against them; they invite identity drift rather than preventing it.
Order by importance. Earlier in the prompt means more influence. Put the character before the location, the location before the mood board.

That is the whole trick most "consistency hack" videos re-sell: define subjects explicitly, feed fewer and better references, order them deliberately.

Volcano Engine / BytePlus (ByteDance). Seedance 2.0 API reference — content roles, reference limits, first-frame exclusivity, return_last_frame. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/1520757
Volcano Engine / BytePlus (ByteDance). Seedance 2.0 official prompt guide — subject binding syntax, reference count and ordering, multi-shot pattern. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/2222480
reAPI. Seedance 2.0 — API documentation and model page. Retrieved July 2026 from reapi.ai/docs/seedance-2-0
Dreamina (CapCut). Seedance 2.0 tool page — voice sample and lip-sync alignment. Retrieved July 2026 from dreamina.capcut.com/tools/seedance-2-0
ByteDance Seed. Seedance 2.0 launch blog and technical report (arXiv:2604.14148). Retrieved July 2026 from seed.bytedance.com/en/blog/seedance-2-0-official-launch
Dreamina (CapCut). Seedance 2.0 prompt tutorial — @AssetName references and Multiframes. Retrieved July 2026 from dreamina.capcut.com/resource/seedance-2-0-prompt
Volcano Engine / BytePlus (ByteDance). Trusted-input exemptions for generated content and consent-verified assets. Retrieved July 2026 from docs.byteplus.com/en/docs/ModelArk/2291680

Seedance 2.0 Character Consistency: References, Voice, Shots

Автор

Категории

Ещё статьи

Cheapest Seedance 2.0 in 2026: Real Prices, Compared

Best CometAPI Alternatives in 2026: 5 Options Compared

Seedance 2.1 and Seedance 2.0 Mini: What's Actually Coming

Seedance 2.0 Character Consistency: References, Voice, Shots

Автор

Категории

Ещё статьи

Cheapest Seedance 2.0 in 2026: Real Prices, Compared

Best CometAPI Alternatives in 2026: 5 Options Compared

Seedance 2.1 and Seedance 2.0 Mini: What's Actually Coming