gemini-omni

Gemini Omni — Google's any-input video model. One endpoint covers text-to-video, image-to-video, and three-image fusion at 720p, 1080p, or 4K. Per-generation pricing.

Gemini Omni — Google's any-to-any video model, exposed through reapi as a single async endpoint. Mode is implicit: the number of image_urls you send picks text-to-video, image-to-video, or three-image fusion. 4 to 10 second outputs at 720p, 1080p, or 4K. Per-generation pricing — see the model page.

Quick example

curl https://reapi.ai/api/v1/videos/generations \
  -H "Authorization: Bearer rk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-omni",
    "prompt": "A kitten playing piano, slow camera push-in",
    "duration": 6,
    "resolution": "1080p",
    "aspect_ratio": "16:9"
  }'

import requests

resp = requests.post(
    "https://reapi.ai/api/v1/videos/generations",
    headers={
        "Authorization": "Bearer rk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "model": "gemini-omni",
        "prompt": "A kitten playing piano, slow camera push-in",
        "duration": 6,
        "resolution": "1080p",
        "aspect_ratio": "16:9",
    },
    timeout=30,
)
print(resp.json())

const r = await fetch("https://reapi.ai/api/v1/videos/generations", {
  method: "POST",
  headers: {
    Authorization: "Bearer rk_live_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gemini-omni",
    prompt: "A kitten playing piano, slow camera push-in",
    duration: 6,
    resolution: "1080p",
    aspect_ratio: "16:9",
  }),
});
console.log(await r.json());

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]any{
        "model":        "gemini-omni",
        "prompt":       "A kitten playing piano, slow camera push-in",
        "duration":     6,
        "resolution":   "1080p",
        "aspect_ratio": "16:9",
    })
    req, _ := http.NewRequest("POST",
        "https://reapi.ai/api/v1/videos/generations", bytes.NewReader(body))
    req.Header.Set("Authorization", "Bearer rk_live_xxx")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    out, _ := io.ReadAll(resp.Body)
    fmt.Println(string(out))
}

Submit response

{
  "id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "model": "gemini-omni",
  "status": "processing",
  "created_at": 1735000000
}

Poll GET /api/v1/tasks/{id} (see the Tasks reference) until status === "completed". The completed payload's output.video_urls holds the generated MP4 URL, valid for 7 days.

Authentication

Every call needs a Bearer token. Generate keys at reapi.ai/settings/apikeys.

Authorization: Bearer YOUR_API_KEY

Keys carry the active workspace's billing scope — there is no separate project header.

Endpoint

POST /api/v1/videos/generations
GET  /api/v1/tasks/{id}

Submission is async. The POST returns immediately with a task_id; the task endpoint returns the same envelope until completion. Polling does not consume credits.

Mode routing

gemini-omni picks its mode from the count of image_urls you send — there is no mode parameter:

`image_urls` count	Mode	What it does
`0` (or omitted)	Text-to-video	Generate from a prompt.
`1`	Image-to-video	Animate from a single starting frame.
`3`	Three-image fusion	Combine three references into one motion shot.

Unsupported counts.

2 is rejected with 400 image_urls cardinality 2 is not supported. Submit 0, 1, or 3 — there is no first/last-frame mode on gemini-omni.
4 or more is also rejected with 400 image_urls accepts at most 3 entries.

Request body

`model` — required

string. Must be "gemini-omni".

`prompt` — string, required

Up to 2,000 characters. Required in every mode (text-to-video, image-to-video, three-image fusion). Empty / whitespace-only prompts are treated as missing.

Failure modes.

Empty / missing → 400 prompt is required (code 20002).
Longer than 2,000 chars → 400 prompt exceeds 2000 characters (got N) (code 20007).

`duration` — integer, default `6`

One of 4, 6, 8, 10 seconds. Other values are rejected with 400 duration must be 4, 6, 8, or 10 seconds, got N.

`resolution` — string, default `"720p"`

720p / 1080p / 4k. Lowercase is canonical; uppercase forms ("4K") are accepted and normalized. Drives the per-generation rate — 720p and 1080p share the same price; only 4K is uplifted.

`aspect_ratio` — string, default `"16:9"`

Output framing. One of:

Value	Shape
`16:9`	Landscape
`9:16`	Portrait

Unknown ratios are rejected with 400 invalid aspect_ratio.

`size` — string, alias for `aspect_ratio`

The same value the supplier doc lists as a separate field. If both are sent, they must match; otherwise 400 aspect_ratio and size disagree. The reapi playground does not surface size; the JSON body still accepts it for parity.

`image_urls` — string[]

Array of public HTTP(S) URLs. Allowed counts: 0, 1, or 3.

0 entries — text-to-video.
1 entry — image-to-video; the image is treated as the starting frame.
3 entries — three-image fusion. The model combines all three references into one motion shot.

No data: URIs. reAPI rejects base64 inputs platform-wide — every URL field on this endpoint must be a public HTTP(S) URL. Upload to your own object storage (S3, R2, OSS, …) and pass the URL.

Response envelope

Submit and poll share the same shape — only status and output fill in over time.

{
  "id": "task_018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e",
  "model": "gemini-omni",
  "status": "completed",
  "created_at": 1735000000,
  "output": {
    "video_urls": ["https://cdn.reapi.ai/media/tasks/018f5a3a1b6e7d9f8c2b4d6e8f0a2c4e/0.mp4"]
  },
  "error": null
}

Field	Type	Notes
`id`	string	Task identifier — keep it for polling and audit
`model`	string	Always `"gemini-omni"` (echo of the submitted `model`)
`status`	string	`processing` / `completed` / `failed`
`created_at`	integer	Submission unix timestamp
`output`	object \| null	`null` until completion. `output.video_urls` holds MP4s
`error`	object \| null	Populated on `failed` — `{ code, message }`

output.video_urls URLs are valid for 7 days. Re-host to your own storage if you need them longer.

Validation errors

All cases below return HTTP 400 with code 20003 unless noted. Pattern-match on code, not message — message strings carry request-specific context (field names, observed values, etc.) and are not a stable contract.

Trigger	Code	Message (illustrative)
`prompt` missing or blank	`20002`	`gemini-omni: prompt is required`
`prompt` longer than 2,000 chars	`20007`	`gemini-omni: prompt exceeds 2000 characters (got N)`
`image_urls` length `2`	`20003`	`gemini-omni: image_urls cardinality 2 is not supported`
`image_urls` length > `3`	`20003`	`gemini-omni: image_urls accepts at most 3 entries, got N`
`duration` not one of 4/6/8/10	`20003`	`gemini-omni: duration must be 4, 6, 8, or 10 seconds, got N`
Unknown `resolution`	`20003`	`gemini-omni: invalid resolution "X" (allowed: 720p / 1080p / 4k)`
Unknown `aspect_ratio`	`20003`	`gemini-omni: invalid aspect_ratio "X" (allowed: 16:9 / 9:16)`
`aspect_ratio` and `size` disagree	`20003`	`gemini-omni: aspect_ratio "X" and size "Y" disagree`
`image_urls` carrying a `data:` URI or non-http(s)	`20003`	`gemini-omni: image_urls entries must be public http(s) URLs`

The full envelope is { "error": { "code", "message", "request_id" } } — see Errors catalog for the wire format and request_id correlation tips.

Recipes

Text-to-video — minimum request

{
  "model": "gemini-omni",
  "prompt": "A little girl walking down a sunset coastal road"
}

Text-to-video — full parameters

{
  "model": "gemini-omni",
  "prompt": "A kitten playing piano, slow camera push-in, cinematic warm tones",
  "duration": 8,
  "resolution": "1080p",
  "aspect_ratio": "16:9"
}

Image-to-video — animate a single frame

{
  "model": "gemini-omni",
  "prompt": "Bring the scene to life with a gentle camera dolly forward",
  "image_urls": ["https://your-cdn.com/first_frame.jpg"],
  "duration": 6,
  "resolution": "1080p"
}

Three-image fusion

{
  "model": "gemini-omni",
  "prompt": "Compose a 10-second product spot mixing scene, character, and product",
  "image_urls": [
    "https://your-cdn.com/scene.jpg",
    "https://your-cdn.com/character.jpg",
    "https://your-cdn.com/product.jpg"
  ],
  "duration": 10,
  "resolution": "1080p",
  "aspect_ratio": "9:16"
}

4K hero shot

{
  "model": "gemini-omni",
  "prompt": "A neon city street in the rain, slow camera pan, reflections on the asphalt",
  "duration": 4,
  "resolution": "4k",
  "aspect_ratio": "16:9"
}

Choosing a mode

Need	Send
Generate from text	`prompt` only
Animate a still	`prompt + image_urls` (1 entry)
Compose scene + character + product	`prompt + image_urls` (3 entries)
Cut spend	Drop `resolution` to `720p` and `duration` to `4`
Hero shot	Pick `4k` at `4-10s`

Polling pattern

The task endpoint behaves identically to other video tasks — the only difference is the completed output shape (video_urls instead of image_urls). A pragmatic schedule:

0–5 minutes:    poll every 5s
5 min – 1 h:    back off gradually toward 1 min
≥ 1 h:          cap at 3 min between polls

A typical task completes in a few minutes. The worker's wall-clock cap is 48 hours, comfortably above any realistic queue.

Pricing

Per generation. 720p and 1080p share the same rate; only 4K is uplifted.

Resolution	4s	6s	8s	10s
`720p` / `1080p`	$0.18	$0.204	$0.216	$0.24
`4k`	$0.36	$0.408	$0.432	$0.48

Bill formula (1 credit = $0.001):

credits = ceil(per_generation_usd × 1000)

Failed jobs refund automatically. See the model page for the live price.

Tips

Prompt motion, not just scene. "Slow push-in, warm tones, shallow depth of field" outperforms a pure noun-list of what's on screen.
Pick 720p first if you're iterating. It's the same per-generation price as 1080p, but renders faster and lets you change your mind on the final tier without re-doing the bill math.
Three-image fusion needs cohesive references. Pick three images that share lighting and composition cues — the model fuses them more cleanly than three random shots.
Pick 4K only when shipping. A 4K render is roughly 2× the cost of 720p / 1080p; reserve it for the final keeper.