How much does the Gemini Omni API cost?

Gemini Omni API pricing is per generation. 720p and 1080p share the same rate across the 4 / 6 / 8 / 10 second duration tiers; 4K is uplifted at every tier. See current per-tier rates in the pricing table on this page. Audio is not part of the reapi-exposed surface, so it never changes the rate. Failed Gemini Omni API jobs refund automatically.

How do I get Gemini Omni API access without a Google Cloud account?

Sign up for reapi, grab an API key, and you can call the Gemini Omni API immediately — no Google Cloud project, no service account, no billing setup. Free credits cover the first few clips. The Gemini Omni API endpoint, request shape, and task polling pattern are the same for every developer.

What input modes does the Gemini Omni API support?

Three modes, picked by the count of image_urls you send: zero is text-to-video, one is image-to-video, three is three-image fusion. The Gemini Omni API does not accept two images — submit either 0, 1, or 3, or the call is rejected at the gateway with a clear 400. Four or more images is also rejected (untested upstream).

What resolutions and durations does the Gemini Omni API support?

The Gemini Omni API supports 720p, 1080p, and 4K at 4, 6, 8, or 10 second durations. Aspect ratios are 16:9 (horizontal) and 9:16 (vertical). Other ratios, durations, and resolutions are rejected client-side before reaching the worker.

Gemini Omni vs Veo 3.1 — which should I pick?

Pick Gemini Omni when you want three-image fusion or 4K on Google's newest video model. Pick Veo 3.1 when you need built-in audio, 8 to 15 second outputs, or the per-second billing model. Both share the same reapi /api/v1/videos/generations endpoint and the same task polling pattern, so swapping is a one-line change in the model field.

Does the Gemini Omni API support audio?

The reapi-exposed surface of the Gemini Omni API does not include audio input or audio output — the clip is silent. If you need built-in audio, use the Veo 3.1 API surface, which mixes dialogue and effects into the same MP4.

Gemini Omni API — Google's Any-Input Video Model

The Gemini Omni API turns a prompt, a single image, or three reference images into a 4 to 10 second clip at 720p, 1080p, or 4K. One endpoint covers text-to-video, image-to-video, and three-image fusion — Google's newest video model, billed per generation.

Input

Prompt*

≤ 2000 chars · required

Resolution

Default 720p

Aspect ratio

16:9 or 9:16 · default 16:9

Duration (seconds)

Default 6 · ignored in reference-to-video mode

Result

Try one of these prompts

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

Animate a single still with the Gemini Omni API

Pass one reference image and a motion prompt. The Gemini Omni API returns a 4 to 10 second clip from the same endpoint as your text-to-video calls — no model swap, no extra integration. Send a 1080p or 4K request when you want the result production-ready.

Generate a clip

Fuse three references in one Gemini Omni API call

Send three reference images alongside a prompt and the Gemini Omni API combines scene, character, and product into a single motion shot. Skip the storyboard, the masking, and the multi-pass compositing — three-image fusion is the most differentiated mode on the Gemini Omni API and ships from the same /api/v1/videos/generations endpoint as text-to-video.

Text-to-video at 4K via the Gemini Omni API

Describe the scene, pick 4K, and the Gemini Omni API returns a clip at the highest fidelity tier — useful for hero shots, social ads, and landing-page video. Audio is omitted in the reapi surface, so the result drops cleanly into any downstream editor.

Pricing

Credit-based — 1 credit = $0.001 USD. Pay only for completed generations.

Category	Unit	Price
720p
4 seconds	1 generation	$0.495 495 credits
6 seconds	1 generation	$0.66 660 credits
8 seconds	1 generation	$0.825 825 credits
10 seconds	1 generation	$0.99 990 credits
1080p
4 seconds	1 generation	$0.495 495 credits
6 seconds	1 generation	$0.66 660 credits
8 seconds	1 generation	$0.825 825 credits
10 seconds	1 generation	$0.99 990 credits
4K
4 seconds	1 generation	$1.155 1155 credits
6 seconds	1 generation	$1.32 1320 credits
8 seconds	1 generation	$1.485 1485 credits
10 seconds	1 generation	$1.65 1650 credits
Reference 720p
per generation	1 generation	$1.32 1320 credits
Reference 1080p
per generation	1 generation	$1.32 1320 credits
Reference 4K
per generation	1 generation	$1.98 1980 credits

Why reAPI

One endpoint, three input modes

The Gemini Omni API picks its mode from the count of image_urls you send. Zero gives you text-to-video, one gives image-to-video, three gives three-image fusion — all on the same /api/v1/videos/generations call, with the same authentication and the same task polling pattern. Two images is not supported; the Gemini Omni API will reject that combination at the gateway with a clear 400.

Per-generation pricing, no surprises

The Gemini Omni API charges per generation, not per second. 720p and 1080p share the same rate; only 4K is uplifted. See current per-tier rates in the pricing table on this page. Failed Gemini Omni API jobs refund automatically — your worker never pays for a result you didn't get.

Access without a Google Cloud account

Skip the Google Cloud onboarding, billing setup, and service-account dance. Sign up for reapi, grab a key, and you can call the Gemini Omni API in under a minute. Same model, same outputs — fewer hoops to ship.

Ship the Gemini Omni API in three steps

step 01
Create an API key
Sign up and grab a key from the dashboard. Free credits cover your first Gemini Omni API calls — no card required.
Open
step 02
Submit a video task
POST to /api/v1/videos/generations with model = gemini-omni. The Gemini Omni API returns a task ID immediately so your worker can move on.
Open
step 03
Poll the result
GET /api/v1/tasks/:id until status is completed. Download the Gemini Omni API output and ship it.
Open

Frequently asked questions

Common questions about this model.

Gemini Omni is Google DeepMind's any-to-any multimodal model family announced at Google I/O 26. The Gemini Omni API in reapi is the video-generation surface of that family — submit a prompt and optionally up to three reference images, and the Gemini Omni API returns a 4 to 10 second clip at 720p, 1080p, or 4K. One endpoint covers text-to-video, image-to-video, and three-image fusion.

Related models

Explore more models in the same category.

View all models

Video

Google

VEO 3.1

Veo 3.1 in five channels — audio, 4K, and 15-second remix in one API.

From $0.092 per generation

VideoRecommended

ByteDance

Seedance 2.0

Text/image/audio-to-video — 4 variants, per-second pricing.

From $0.037 per second

VideoRecommended

ByteDance

Seedance 2.5

Next-gen text/image/audio-to-video from ByteDance — coming soon.

Coming soon

Video

Alibaba Cloud Bailian

Happy Horse 1.0

Text, image, reference video, and video edit — one Happy Horse 1.0 API call.

From $0.146 per second

View all models

start building

Ready to ship?

Try it in the playground or grab an API key to integrate now.

Try Gemini Omni View API docs

What you can build with this model

Real-world workflows and production use cases you can build and ship with this model.

Animate a single still with the Gemini Omni API

Generate a clip

Fuse three references in one Gemini Omni API call