Skip to main content

Overview

Gemini Omni is a Google multimodal model on APIXO. Use it to create reusable audio assets, create character assets from an image, and generate videos with text, image, video, audio, and character references.
CapabilityValue
Model IDgemini-omni
Modesgemini-omni-audio, gemini-omni-character, gemini-omni-video
Audio asset outputaudioId
Character asset outputcharacterId
Video resolutions720p, 1080p, 4k
Video duration without video input4, 6, 8, 10 seconds
Video aspect ratios16:9, 9:16
OutputAsset IDs for audio/character modes, video URLs in resultJson.resultUrls for video mode

Workflow

  1. Create an audio asset with mode: "gemini-omni-audio" and save the returned audioId.
  2. Create a character asset with mode: "gemini-omni-character" and save the returned characterId.
  3. Generate a video with mode: "gemini-omni-video" and optionally pass audio_ids, character_ids, image_urls, or video_urls.
Audio and character modes return final successful results directly. Video mode returns a taskId and should be polled or delivered by webhook.

Endpoint and authentication

Base URL:
https://api.apixo.ai/api/v1
MethodEndpointPurpose
POST/generateTask/gemini-omniSubmit an audio, character, or video task
GET/statusTask/gemini-omni?taskId={taskId}Poll task status and retrieve results
All requests require your APIXO API key:
Authorization: Bearer YOUR_API_KEY
Submit requests also require:
Content-Type: application/json

Copy-paste async quickstart

This minimal request submits a video task and returns a taskId.
curl -X POST "https://api.apixo.ai/api/v1/generateTask/gemini-omni" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "async",
    "input": {
      "mode": "gemini-omni-video",
      "prompt": "a cinematic character walking through a neon street at night",
      "resolution": "1080p",
      "duration": "6",
      "aspect_ratio": "16:9"
    }
  }'
Successful response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678"
  }
}
Save the taskId; you need it to poll for the final video result.

Poll for result

curl -X GET "https://api.apixo.ai/api/v1/statusTask/gemini-omni?taskId=task_12345678" \
  -H "Authorization: Bearer YOUR_API_KEY"
Processing response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "processing",
    "createTime": 1767965610929
  }
}
Success response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "resultJson": "{\"resultUrls\":[\"https://file.apixo.ai/video.mp4\"]}",
    "createTime": 1767965610929,
    "completeTime": 1767965730929,
    "costTime": 120000
  }
}
Failed response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "failed",
    "failCode": "CONTENT_VIOLATION",
    "failMsg": "Content does not meet safety guidelines",
    "createTime": 1767965610929,
    "completeTime": 1767965620132
  }
}
Parse resultJson after state becomes success:
const payload = JSON.parse(data.resultJson);
const videoUrls = payload.resultUrls;

Request body

Audio asset

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-audio",
    "voice_key": "achernar",
    "name": "demo voice",
    "prompt": "warm, calm narrator",
    "preview_text": "Welcome to the demo."
  }
}

Character asset

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-character",
    "prompt": "a friendly studio presenter",
    "name": "studio presenter",
    "image_urls": [
      "https://example.com/character.png"
    ],
    "audio_ids": [
      "audio_123"
    ]
  }
}

Video without video input

{
  "request_type": "callback",
  "callback_url": "https://your-server.com/webhooks/apixo",
  "input": {
    "mode": "gemini-omni-video",
    "prompt": "a cinematic character walking through a neon street at night",
    "resolution": "1080p",
    "duration": "6",
    "aspect_ratio": "16:9",
    "character_ids": [
      "character_123"
    ],
    "audio_ids": [
      "audio_123",
      "audio_456",
      "audio_789"
    ]
  }
}

Video with video input

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-video",
    "prompt": "replace the room with a neon stage",
    "resolution": "4k",
    "video_urls": [
      "https://example.com/source.mp4"
    ],
    "video_start": 0,
    "video_end": 10
  }
}

Parameters

request_type
string
default:"async"
required
Result delivery mode. Use async for polling with statusTask, or callback for webhook delivery. Callback mode is recommended for production video generation.
callback_url
string
Required when request_type is callback. Must be a public HTTPS URL that can receive the final task payload. See Webhooks.
input
object
required
Gemini Omni input parameters.
For video mode, the total multimodal quota from image_urls, video_urls, and character_ids cannot exceed 7 units. Each image uses 1 unit, each character ID uses 1 unit, and one video URL uses 2 units. audio_ids does not consume this quota.

Official base voice options

voice_key is required for gemini-omni-audio. Choose one of these official base voices.
voice_keyVoice description
achernarFemale, soft, high pitch
achirdMale, friendly, mid pitch
algenibMale, gravelly, low pitch
algiebaMale, easy-going, mid-low pitch
alnilamMale, firm, mid-low pitch
aoedeFemale, breezy, mid pitch
autonoeFemale, bright, mid pitch
callirrhoeFemale, easy-going, mid pitch
charonMale, informative, lower pitch
despinaFemale, smooth, mid pitch
enceladusMale, breathy, lower pitch
erinomeFemale, clear, mid pitch
fenrirMale, excitable, younger pitch
gacruxFemale, mature, mid pitch
iapetusMale, clear, mid-low pitch
koreFemale, firm, mid pitch
laomedeiaFemale, upbeat, mid-high pitch
ledaFemale, youthful, mid-high pitch
orusMale, firm, mid-low pitch
puckMale, upbeat, mid pitch
pulcherrimaUngendered, forward, mid-high pitch
rasalgethiMale, informative, mid pitch
sadachbiaMale, lively, low pitch
sadaltagerMale, knowledgeable, mid pitch
schedarMale, even, mid-low pitch
sulafatFemale, warm, mid pitch
umbrielMale, smooth, lower pitch
vindemiatrixFemale, gentle, mid pitch
zephyrFemale, bright, mid-high pitch
zubenelgenubiMale, casual, mid-low pitch

Response format

Audio asset response

Audio mode returns a final successful result directly:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "audioId": "audio_123",
    "name": "demo voice",
    "createTime": 1767965610929,
    "completeTime": 1767965614929,
    "costTime": 4000
  }
}
Save audioId; use it later in audio_ids for character or video requests.

Character asset response

Character mode returns a final successful result directly:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "characterId": "character_123",
    "characterName": "studio presenter",
    "imageUrl": "https://file.apixo.ai/character.png",
    "createTime": 1767965610929,
    "completeTime": 1767965620929,
    "costTime": 10000
  }
}
Save characterId; use it later in character_ids for video requests.

Submit task response

POST /generateTask/gemini-omni returns a task ID when a video task is accepted:
code
integer
API status code. 200 means the task was accepted.
message
string
Human-readable status message.
data.taskId
string
Unique task identifier used with the status endpoint.

Status response fields

taskId
string
Unique task identifier.
state
string
Current task state: pending, processing, processing_r2, success, or failed.
audioId
string
Audio asset ID for audio mode.
name
string
Audio asset name.
characterId
string
Character asset ID for character mode.
characterName
string
Character asset name.
imageUrl
string
Character image URL.
resultJson
string
JSON string containing generated video URLs. Present when video state is success.
failCode
string
Machine-readable failure code. Present when state is failed.
failMsg
string
Human-readable failure message. Present when state is failed.
createTime
integer
Task creation timestamp in Unix milliseconds.
completeTime
integer
Task completion timestamp in Unix milliseconds. Present after completion.
costTime
integer
Processing duration in milliseconds. Present after successful or failed completion when available.

Webhook callback mode

Use callback mode when your backend should receive the final video result automatically instead of polling.
curl -X POST "https://api.apixo.ai/api/v1/generateTask/gemini-omni" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "callback",
    "callback_url": "https://your-server.com/webhooks/apixo",
    "input": {
      "mode": "gemini-omni-video",
      "prompt": "a product reveal shot with smooth camera motion and dramatic lighting",
      "resolution": "720p",
      "duration": "4",
      "aspect_ratio": "16:9"
    }
  }'
See Webhooks for delivery requirements and retry behavior.

Billing

Audio and character asset modes are free asset-generation modes. Video mode is billed by whether a video input is provided and by selected resolution.
ModeConditionAPIXO price
gemini-omni-audioPer request$0.00
gemini-omni-characterPer request$0.00
gemini-omni-videoNo video_urls, 720p or 1080p$0.10 / second
gemini-omni-videoNo video_urls, 4k$0.20 / second
gemini-omni-videoWith video_urls, 720p or 1080p$1.20 / request
gemini-omni-videoWith video_urls, 4k$1.80 / request
Without video input, final video billing is resolution unit price * duration. With video input, duration is ignored and billing is fixed per request. For current route and market comparison pricing, see Pricing.

Latency and polling

Actual latency may vary by prompt complexity, media inputs, selected route, and current queue load.
WorkflowRecommended first pollPoll interval
Audio assetUsually returns directlyOptional lookup by taskId
Character assetUsually returns directlyOptional lookup by taskId
Video generation60-90s after task creation5-10s
For production video workloads, use callback mode to avoid frequent polling.
Audio and character modes share a free asset-generation limit of 1000 requests per user per 60 seconds. Video mode is billed separately and does not consume this free asset limit. Rate limits and concurrency can vary by account, API key, and route. If you receive 429, slow down requests and retry with backoff. For account-level details, see System APIs.

Errors and troubleshooting

HTTP errors

CodeMeaningWhat to do
400Invalid request body, missing parameter, unsupported mode, unsupported value, or media quota exceededFix the request before retrying
401Missing or invalid API keyCheck the Authorization header
402Insufficient balance or quotaAdd balance or switch account/key
403Key or route cannot access the modelCheck permissions and route strategy
429Rate limit or concurrency limit reachedRetry with exponential backoff
500Server errorRetry with backoff
502Upstream service errorRetry with backoff
504Upstream timeoutRetry or use callback mode for long-running video jobs

Task failure codes

Fail codeMeaningWhat to do
CONTENT_VIOLATIONPrompt or media input was rejected by safety checksChange the prompt or input media
INVALID_INPUT_URLA provided input URL cannot be accessedUse a public, directly reachable URL
UPSTREAM_ERRORUpstream service failed the taskRetry later or contact support with the taskId
UNKNOWN_ERRORThe failure could not be mapped to a more specific codeRetry later or contact support with the taskId

Common validation issues

IssueFix
Missing modeSet one of the supported Gemini Omni modes
Missing voice_key in audio modeChoose one official base voice key
Missing name in audio modeProvide a non-empty display name
Missing image_urls in character modeProvide exactly 1 public image URL
Too many audio_idsUse at most 1 audio asset ID for character requests, or at most 3 audio asset IDs for video requests
Multiple video_urlsUse at most 1 video URL
Missing duration without video_urlsSet duration to 4, 6, 8, or 10
video_start or video_end without video_urlsProvide video_urls or remove the range fields
Multimodal quota exceeds 7 unitsUse fewer image URLs, video URLs, or character IDs
See Error Codes for the full error reference.