gemini-omni
Gemini Omni — Google's any-input video model. One endpoint covers text-to-video, image-to-video, and three-image fusion at 720p, 1080p, or 4K. Per-generation pricing.
Gemini Omni — Google's any-to-any video model, exposed through reapi as
a single async endpoint. Mode is implicit: the number of image_urls you
send picks text-to-video, image-to-video, or three-image fusion.
4 to 10 second outputs at 720p, 1080p, or 4K. Per-generation pricing —
see the model page.
Quick example
curl https://reapi.ai/api/v1/videos/generations \
-H "Authorization: Bearer rk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-omni",
"prompt": "A kitten playing piano, slow camera push-in",
"duration": 6,
"resolution": "1080p",
"aspect_ratio": "16:9"
}'import requests
resp = requests.post(
"https://reapi.ai/api/v1/videos/generations",
headers={
"Authorization": "Bearer rk_live_xxx",
"Content-Type": "application/json",
},
json={
"model": "gemini-omni",
"prompt": "A kitten playing piano, slow camera push-in",
"duration": 6,
"resolution": "1080p",
"aspect_ratio": "16:9",
},
timeout=30,
)
print(resp.json())const r = await fetch("https://reapi.ai/api/v1/videos/generations", {
method: "POST",
headers: {
Authorization: "Bearer rk_live_xxx",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gemini-omni",
prompt: "A kitten playing piano, slow camera push-in",
duration: 6,
resolution: "1080p",
aspect_ratio: "16:9",
}),
});
console.log(await r.json());package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]any{
"model": "gemini-omni",
"prompt": "A kitten playing piano, slow camera push-in",
"duration": 6,
"resolution": "1080p",
"aspect_ratio": "16:9",
})
req, _ := http.NewRequest("POST",
"https://reapi.ai/api/v1/videos/generations", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer rk_live_xxx")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
out, _ := io.ReadAll(resp.Body)
fmt.Println(string(out))
}Submit response
{
"id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"model": "gemini-omni",
"status": "processing",
"created_at": 1735000000
}Poll GET /api/v1/tasks/{id} (see the Tasks reference) until
status === "completed". The completed payload's output.video_urls
holds the generated MP4 URL, valid for 7 days.
Authentication
Every call needs a Bearer token. Generate keys at reapi.ai/settings/apikeys.
Authorization: Bearer YOUR_API_KEYKeys carry the active workspace's billing scope — there is no separate project header.
Endpoint
POST /api/v1/videos/generations
GET /api/v1/tasks/{id}Submission is async. The POST returns immediately with a task_id; the
task endpoint returns the same envelope until completion. Polling does
not consume credits.
Mode routing
gemini-omni picks its mode from the count of image_urls you send —
there is no mode parameter:
image_urls count | Mode | What it does |
|---|---|---|
0 (or omitted) | Text-to-video | Generate from a prompt. |
1 | Image-to-video | Animate from a single starting frame. |
3 | Three-image fusion | Combine three references into one motion shot. |
Unsupported counts.
2is rejected with400 image_urls cardinality 2 is not supported. Submit 0, 1, or 3 — there is no first/last-frame mode ongemini-omni.4or more is also rejected with400 image_urls accepts at most 3 entries.
Request body
model — required
string. Must be "gemini-omni".
prompt — string, required
Up to 2,000 characters. Required in every mode (text-to-video, image-to-video, three-image fusion). Empty / whitespace-only prompts are treated as missing.
Failure modes.
- Empty / missing →
400 prompt is required(code20002). - Longer than 2,000 chars →
400 prompt exceeds 2000 characters (got N)(code20007).
duration — integer, default 6
One of 4, 6, 8, 10 seconds. Other values are rejected with
400 duration must be 4, 6, 8, or 10 seconds, got N.
resolution — string, default "720p"
720p / 1080p / 4k. Lowercase is canonical; uppercase forms ("4K")
are accepted and normalized. Drives the per-generation rate — 720p and
1080p share the same price; only 4K is uplifted.
aspect_ratio — string, default "16:9"
Output framing. One of:
| Value | Shape |
|---|---|
16:9 | Landscape |
9:16 | Portrait |
Unknown ratios are rejected with 400 invalid aspect_ratio.
size — string, alias for aspect_ratio
The same value the supplier doc lists as a separate field. If both are
sent, they must match; otherwise 400 aspect_ratio and size disagree.
The reapi playground does not surface size; the JSON body still accepts
it for parity.
image_urls — string[]
Array of public HTTP(S) URLs. Allowed counts: 0, 1, or 3.
- 0 entries — text-to-video.
- 1 entry — image-to-video; the image is treated as the starting frame.
- 3 entries — three-image fusion. The model combines all three references into one motion shot.
No data: URIs. reAPI rejects base64 inputs platform-wide — every
URL field on this endpoint must be a public HTTP(S) URL. Upload to your
own object storage (S3, R2, OSS, …) and pass the URL.
Response envelope
Submit and poll share the same shape — only status and output fill in
over time.
{
"id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
"model": "gemini-omni",
"status": "completed",
"created_at": 1735000000,
"output": {
"video_urls": ["https://cdn.reapi.ai/media/tasks/018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e/0.mp4"]
},
"error": null
}| Field | Type | Notes |
|---|---|---|
id | string | Task identifier — keep it for polling and audit |
model | string | Always "gemini-omni" (echo of the submitted model) |
status | string | processing / completed / failed |
created_at | integer | Submission unix timestamp |
output | object | null | null until completion. output.video_urls holds MP4s |
error | object | null | Populated on failed — { code, message } |
output.video_urls URLs are valid for 7 days. Re-host to your own
storage if you need them longer.
Validation errors
All cases below return HTTP 400 with code 20003 unless noted. Pattern-match
on code, not message — message strings carry request-specific context
(field names, observed values, etc.) and are not a stable contract.
| Trigger | Code | Message (illustrative) |
|---|---|---|
prompt missing or blank | 20002 | gemini-omni: prompt is required |
prompt longer than 2,000 chars | 20007 | gemini-omni: prompt exceeds 2000 characters (got N) |
image_urls length 2 | 20003 | gemini-omni: image_urls cardinality 2 is not supported |
image_urls length > 3 | 20003 | gemini-omni: image_urls accepts at most 3 entries, got N |
duration not one of 4/6/8/10 | 20003 | gemini-omni: duration must be 4, 6, 8, or 10 seconds, got N |
Unknown resolution | 20003 | gemini-omni: invalid resolution "X" (allowed: 720p / 1080p / 4k) |
Unknown aspect_ratio | 20003 | gemini-omni: invalid aspect_ratio "X" (allowed: 16:9 / 9:16) |
aspect_ratio and size disagree | 20003 | gemini-omni: aspect_ratio "X" and size "Y" disagree |
image_urls carrying a data: URI or non-http(s) | 20003 | gemini-omni: image_urls entries must be public http(s) URLs |
The full envelope is { "error": { "code", "message", "request_id" } } —
see Errors catalog for the wire format and request_id
correlation tips.
Recipes
Text-to-video — minimum request
{
"model": "gemini-omni",
"prompt": "A little girl walking down a sunset coastal road"
}Text-to-video — full parameters
{
"model": "gemini-omni",
"prompt": "A kitten playing piano, slow camera push-in, cinematic warm tones",
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9"
}Image-to-video — animate a single frame
{
"model": "gemini-omni",
"prompt": "Bring the scene to life with a gentle camera dolly forward",
"image_urls": ["https://your-cdn.com/first_frame.jpg"],
"duration": 6,
"resolution": "1080p"
}Three-image fusion
{
"model": "gemini-omni",
"prompt": "Compose a 10-second product spot mixing scene, character, and product",
"image_urls": [
"https://your-cdn.com/scene.jpg",
"https://your-cdn.com/character.jpg",
"https://your-cdn.com/product.jpg"
],
"duration": 10,
"resolution": "1080p",
"aspect_ratio": "9:16"
}4K hero shot
{
"model": "gemini-omni",
"prompt": "A neon city street in the rain, slow camera pan, reflections on the asphalt",
"duration": 4,
"resolution": "4k",
"aspect_ratio": "16:9"
}Choosing a mode
| Need | Send |
|---|---|
| Generate from text | prompt only |
| Animate a still | prompt + image_urls (1 entry) |
| Compose scene + character + product | prompt + image_urls (3 entries) |
| Cut spend | Drop resolution to 720p and duration to 4 |
| Hero shot | Pick 4k at 4-10s |
Polling pattern
The task endpoint behaves identically to other video tasks — the only
difference is the completed output shape (video_urls instead of
image_urls). A pragmatic schedule:
0–5 minutes: poll every 5s
5 min – 1 h: back off gradually toward 1 min
≥ 1 h: cap at 3 min between pollsA typical task completes in a few minutes. The worker's wall-clock cap is 48 hours, comfortably above any realistic queue.
Pricing
Per generation. 720p and 1080p share the same rate; only 4K is uplifted.
| Resolution | 4s | 6s | 8s | 10s |
|---|---|---|---|---|
720p / 1080p | $0.18 | $0.204 | $0.216 | $0.24 |
4k | $0.36 | $0.408 | $0.432 | $0.48 |
Bill formula (1 credit = $0.001):
credits = ceil(per_generation_usd × 1000)Failed jobs refund automatically. See the model page for the live price.
Tips
- Prompt motion, not just scene. "Slow push-in, warm tones, shallow depth of field" outperforms a pure noun-list of what's on screen.
- Pick 720p first if you're iterating. It's the same per-generation price as 1080p, but renders faster and lets you change your mind on the final tier without re-doing the bill math.
- Three-image fusion needs cohesive references. Pick three images that share lighting and composition cues — the model fuses them more cleanly than three random shots.
- Pick 4K only when shipping. A 4K render is roughly 2× the cost of 720p / 1080p; reserve it for the final keeper.
reAPI Docs