Overview
Gemini Omni is a Google multimodal model on APIXO. Use it to create reusable audio assets, create character assets from an image, and generate videos with text, image, video, audio, and character references.| Capability | Value |
|---|---|
| Model ID | gemini-omni |
| Modes | gemini-omni-audio, gemini-omni-character, gemini-omni-video |
| Audio asset output | audioId |
| Character asset output | characterId |
| Video resolutions | 720p, 1080p, 4k |
| Video duration without video input | 4, 6, 8, 10 seconds |
| Video aspect ratios | 16:9, 9:16 |
| Output | Asset IDs for audio/character modes, video URLs in resultJson.resultUrls for video mode |
Workflow
- Create an audio asset with
mode: "gemini-omni-audio"and save the returnedaudioId. - Create a character asset with
mode: "gemini-omni-character"and save the returnedcharacterId. - Generate a video with
mode: "gemini-omni-video"and optionally passaudio_ids,character_ids,image_urls, orvideo_urls.
taskId and should be polled or delivered by webhook.
Endpoint and authentication
Base URL:| Method | Endpoint | Purpose |
|---|---|---|
POST | /generateTask/gemini-omni | Submit an audio, character, or video task |
GET | /statusTask/gemini-omni?taskId={taskId} | Poll task status and retrieve results |
Copy-paste async quickstart
This minimal request submits a video task and returns ataskId.
taskId; you need it to poll for the final video result.
Poll for result
resultJson after state becomes success:
Request body
Audio asset
Character asset
Video without video input
Video with video input
Parameters
Result delivery mode. Use
async for polling with statusTask, or callback for webhook delivery. Callback mode is recommended for production video generation.Required when
request_type is callback. Must be a public HTTPS URL that can receive the final task payload. See Webhooks.Gemini Omni input parameters.
Official base voice options
voice_key is required for gemini-omni-audio. Choose one of these official base voices.
| voice_key | Voice description |
|---|---|
achernar | Female, soft, high pitch |
achird | Male, friendly, mid pitch |
algenib | Male, gravelly, low pitch |
algieba | Male, easy-going, mid-low pitch |
alnilam | Male, firm, mid-low pitch |
aoede | Female, breezy, mid pitch |
autonoe | Female, bright, mid pitch |
callirrhoe | Female, easy-going, mid pitch |
charon | Male, informative, lower pitch |
despina | Female, smooth, mid pitch |
enceladus | Male, breathy, lower pitch |
erinome | Female, clear, mid pitch |
fenrir | Male, excitable, younger pitch |
gacrux | Female, mature, mid pitch |
iapetus | Male, clear, mid-low pitch |
kore | Female, firm, mid pitch |
laomedeia | Female, upbeat, mid-high pitch |
leda | Female, youthful, mid-high pitch |
orus | Male, firm, mid-low pitch |
puck | Male, upbeat, mid pitch |
pulcherrima | Ungendered, forward, mid-high pitch |
rasalgethi | Male, informative, mid pitch |
sadachbia | Male, lively, low pitch |
sadaltager | Male, knowledgeable, mid pitch |
schedar | Male, even, mid-low pitch |
sulafat | Female, warm, mid pitch |
umbriel | Male, smooth, lower pitch |
vindemiatrix | Female, gentle, mid pitch |
zephyr | Female, bright, mid-high pitch |
zubenelgenubi | Male, casual, mid-low pitch |
Response format
Audio asset response
Audio mode returns a final successful result directly:audioId; use it later in audio_ids for character or video requests.
Character asset response
Character mode returns a final successful result directly:characterId; use it later in character_ids for video requests.
Submit task response
POST /generateTask/gemini-omni returns a task ID when a video task is accepted:
API status code.
200 means the task was accepted.Human-readable status message.
Unique task identifier used with the status endpoint.
Status response fields
Unique task identifier.
Current task state:
pending, processing, processing_r2, success, or failed.Audio asset ID for audio mode.
Audio asset name.
Character asset ID for character mode.
Character asset name.
Character image URL.
JSON string containing generated video URLs. Present when video
state is success.Machine-readable failure code. Present when
state is failed.Human-readable failure message. Present when
state is failed.Task creation timestamp in Unix milliseconds.
Task completion timestamp in Unix milliseconds. Present after completion.
Processing duration in milliseconds. Present after successful or failed completion when available.
Webhook callback mode
Use callback mode when your backend should receive the final video result automatically instead of polling.Billing
Audio and character asset modes are free asset-generation modes. Video mode is billed by whether a video input is provided and by selectedresolution.
| Mode | Condition | APIXO price |
|---|---|---|
gemini-omni-audio | Per request | $0.00 |
gemini-omni-character | Per request | $0.00 |
gemini-omni-video | No video_urls, 720p or 1080p | $0.10 / second |
gemini-omni-video | No video_urls, 4k | $0.20 / second |
gemini-omni-video | With video_urls, 720p or 1080p | $1.20 / request |
gemini-omni-video | With video_urls, 4k | $1.80 / request |
resolution unit price * duration. With video input, duration is ignored and billing is fixed per request.
For current route and market comparison pricing, see Pricing.
Latency and polling
Actual latency may vary by prompt complexity, media inputs, selected route, and current queue load.| Workflow | Recommended first poll | Poll interval |
|---|---|---|
| Audio asset | Usually returns directly | Optional lookup by taskId |
| Character asset | Usually returns directly | Optional lookup by taskId |
| Video generation | 60-90s after task creation | 5-10s |
429, slow down requests and retry with backoff. For account-level details, see System APIs.
Errors and troubleshooting
HTTP errors
| Code | Meaning | What to do |
|---|---|---|
400 | Invalid request body, missing parameter, unsupported mode, unsupported value, or media quota exceeded | Fix the request before retrying |
401 | Missing or invalid API key | Check the Authorization header |
402 | Insufficient balance or quota | Add balance or switch account/key |
403 | Key or route cannot access the model | Check permissions and route strategy |
429 | Rate limit or concurrency limit reached | Retry with exponential backoff |
500 | Server error | Retry with backoff |
502 | Upstream service error | Retry with backoff |
504 | Upstream timeout | Retry or use callback mode for long-running video jobs |
Task failure codes
| Fail code | Meaning | What to do |
|---|---|---|
CONTENT_VIOLATION | Prompt or media input was rejected by safety checks | Change the prompt or input media |
INVALID_INPUT_URL | A provided input URL cannot be accessed | Use a public, directly reachable URL |
UPSTREAM_ERROR | Upstream service failed the task | Retry later or contact support with the taskId |
UNKNOWN_ERROR | The failure could not be mapped to a more specific code | Retry later or contact support with the taskId |
Common validation issues
| Issue | Fix |
|---|---|
Missing mode | Set one of the supported Gemini Omni modes |
Missing voice_key in audio mode | Choose one official base voice key |
Missing name in audio mode | Provide a non-empty display name |
Missing image_urls in character mode | Provide exactly 1 public image URL |
Too many audio_ids | Use at most 1 audio asset ID for character requests, or at most 3 audio asset IDs for video requests |
Multiple video_urls | Use at most 1 video URL |
Missing duration without video_urls | Set duration to 4, 6, 8, or 10 |
video_start or video_end without video_urls | Provide video_urls or remove the range fields |
| Multimodal quota exceeds 7 units | Use fewer image URLs, video URLs, or character IDs |