Gemini Omni

Overview

Gemini Omni is a Google multimodal model on APIXO. Use it to create reusable audio assets, create character assets from an image, and generate videos with text, image, video, audio, and character references.

Capability	Value
Model ID	`gemini-omni`
Modes	`gemini-omni-audio`, `gemini-omni-character`, `gemini-omni-video`
Audio asset output	`audioId`
Character asset output	`characterId`
Video resolutions	`720p`, `1080p`, `4k`
Video duration without video input	`4`, `6`, `8`, `10` seconds
Video aspect ratios	`16:9`, `9:16`
Output	Asset IDs for audio/character modes, video URLs in `resultJson.resultUrls` for video mode

Workflow

Create an audio asset with mode: "gemini-omni-audio" and save the returned audioId.
Create a character asset with mode: "gemini-omni-character" and save the returned characterId.
Generate a video with mode: "gemini-omni-video" and optionally pass audio_ids, character_ids, image_urls, or video_urls.

Audio and character modes return final successful results directly. Video mode returns a taskId and should be polled or delivered by webhook.

Endpoint and authentication

Base URL:

https://api.apixo.ai/api/v1

Method	Endpoint	Purpose
`POST`	`/generateTask/gemini-omni`	Submit an audio, character, or video task
`GET`	`/statusTask/gemini-omni?taskId={taskId}`	Poll task status and retrieve results

All requests require your APIXO API key:

Authorization: Bearer YOUR_API_KEY

Submit requests also require:

Content-Type: application/json

Copy-paste async quickstart

This minimal request submits a video task and returns a taskId.

curl -X POST "https://api.apixo.ai/api/v1/generateTask/gemini-omni" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "async",
    "input": {
      "mode": "gemini-omni-video",
      "prompt": "a cinematic character walking through a neon street at night",
      "resolution": "1080p",
      "duration": "6",
      "aspect_ratio": "16:9"
    }
  }'

Successful response:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678"
  }
}

Save the taskId; you need it to poll for the final video result.

Poll for result

curl -X GET "https://api.apixo.ai/api/v1/statusTask/gemini-omni?taskId=task_12345678" \
  -H "Authorization: Bearer YOUR_API_KEY"

Processing response:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "processing",
    "createTime": 1767965610929
  }
}

Success response:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "resultJson": "{\"resultUrls\":[\"https://file.apixo.ai/video.mp4\"]}",
    "createTime": 1767965610929,
    "completeTime": 1767965730929,
    "costTime": 120000
  }
}

Failed response:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "failed",
    "failCode": "CONTENT_VIOLATION",
    "failMsg": "Content does not meet safety guidelines",
    "createTime": 1767965610929,
    "completeTime": 1767965620132
  }
}

Parse resultJson after state becomes success:

const payload = JSON.parse(data.resultJson);
const videoUrls = payload.resultUrls;

Request body

Audio asset

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-audio",
    "voice_key": "achernar",
    "name": "demo voice",
    "prompt": "warm, calm narrator",
    "preview_text": "Welcome to the demo."
  }
}

Character asset

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-character",
    "prompt": "a friendly studio presenter",
    "name": "studio presenter",
    "image_urls": [
      "https://example.com/character.png"
    ],
    "audio_ids": [
      "audio_123"
    ]
  }
}

Video without video input

{
  "request_type": "callback",
  "callback_url": "https://your-server.com/webhooks/apixo",
  "input": {
    "mode": "gemini-omni-video",
    "prompt": "a cinematic character walking through a neon street at night",
    "resolution": "1080p",
    "duration": "6",
    "aspect_ratio": "16:9",
    "character_ids": [
      "character_123"
    ],
    "audio_ids": [
      "audio_123",
      "audio_456",
      "audio_789"
    ]
  }
}

Video with video input

{
  "request_type": "async",
  "input": {
    "mode": "gemini-omni-video",
    "prompt": "replace the room with a neon stage",
    "resolution": "4k",
    "video_urls": [
      "https://example.com/source.mp4"
    ],
    "video_start": 0,
    "video_end": 10
  }
}

Parameters

string

default:"async"

required

Result delivery mode. Use async for polling with statusTask, or callback for webhook delivery. Callback mode is recommended for production video generation.

string

Required when request_type is callback. Must be a public HTTPS URL that can receive the final task payload. See Webhooks.

object

required

Gemini Omni input parameters.

Show properties

string

required

Generation mode. Supported values: gemini-omni-audio, gemini-omni-character, gemini-omni-video.

string

Prompt text. Required for gemini-omni-character and gemini-omni-video. Optional voice description for gemini-omni-audio.

string

Required for gemini-omni-audio. Official base voice key used to create the audio asset. There is no default value.

string

Required for gemini-omni-audio. Optional display name for gemini-omni-character.

string

Optional preview dialogue for gemini-omni-audio.

string[]

Public image URLs. Required for gemini-omni-character with exactly 1 image. Optional reference images for gemini-omni-video.

string[]

Optional audio asset IDs returned by gemini-omni-audio. Character requests support up to 1 ID. Video requests support up to 3 IDs.

string[]

Optional character asset IDs returned by gemini-omni-character. Up to 3 IDs are supported for video requests.

string[]

Optional video input for gemini-omni-video. Up to 1 public video URL is supported.

string

Required for gemini-omni-video. Supported values: 720p, 1080p, 4k. Numeric aliases 720 and 1080 are accepted and normalized.

string | integer

Required for gemini-omni-video when video_urls is not provided. Supported values: 4, 6, 8, 10. Ignored when video_urls is provided.

string

Optional output aspect ratio for gemini-omni-video. Supported values: 16:9, 9:16.

integer

Optional start position for the video input. Requires video_urls and cannot be negative.

integer

Optional end position for the video input. Requires video_urls, cannot be negative, and must be greater than or equal to video_start when both are provided.

integer

Optional non-negative random seed for video generation.

For video mode, the total multimodal quota from image_urls, video_urls, and character_ids cannot exceed 7 units. Each image uses 1 unit, each character ID uses 1 unit, and one video URL uses 2 units. audio_ids does not consume this quota.

Official base voice options

voice_key is required for gemini-omni-audio. Choose one of these official base voices.

voice_key	Voice description
`achernar`	Female, soft, high pitch
`achird`	Male, friendly, mid pitch
`algenib`	Male, gravelly, low pitch
`algieba`	Male, easy-going, mid-low pitch
`alnilam`	Male, firm, mid-low pitch
`aoede`	Female, breezy, mid pitch
`autonoe`	Female, bright, mid pitch
`callirrhoe`	Female, easy-going, mid pitch
`charon`	Male, informative, lower pitch
`despina`	Female, smooth, mid pitch
`enceladus`	Male, breathy, lower pitch
`erinome`	Female, clear, mid pitch
`fenrir`	Male, excitable, younger pitch
`gacrux`	Female, mature, mid pitch
`iapetus`	Male, clear, mid-low pitch
`kore`	Female, firm, mid pitch
`laomedeia`	Female, upbeat, mid-high pitch
`leda`	Female, youthful, mid-high pitch
`orus`	Male, firm, mid-low pitch
`puck`	Male, upbeat, mid pitch
`pulcherrima`	Ungendered, forward, mid-high pitch
`rasalgethi`	Male, informative, mid pitch
`sadachbia`	Male, lively, low pitch
`sadaltager`	Male, knowledgeable, mid pitch
`schedar`	Male, even, mid-low pitch
`sulafat`	Female, warm, mid pitch
`umbriel`	Male, smooth, lower pitch
`vindemiatrix`	Female, gentle, mid pitch
`zephyr`	Female, bright, mid-high pitch
`zubenelgenubi`	Male, casual, mid-low pitch

Response format

Audio asset response

Audio mode returns a final successful result directly:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "audioId": "audio_123",
    "name": "demo voice",
    "createTime": 1767965610929,
    "completeTime": 1767965614929,
    "costTime": 4000
  }
}

Save audioId; use it later in audio_ids for character or video requests.

Character asset response

Character mode returns a final successful result directly:

{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "characterId": "character_123",
    "characterName": "studio presenter",
    "imageUrl": "https://file.apixo.ai/character.png",
    "createTime": 1767965610929,
    "completeTime": 1767965620929,
    "costTime": 10000
  }
}

Save characterId; use it later in character_ids for video requests.

Submit task response

POST /generateTask/gemini-omni returns a task ID when a video task is accepted:

integer

API status code. 200 means the task was accepted.

string

Human-readable status message.

string

Unique task identifier used with the status endpoint.

Status response fields

string

Unique task identifier.

string

Current task state: pending, processing, processing_r2, success, or failed.

string

Audio asset ID for audio mode.

string

Audio asset name.

string

Character asset ID for character mode.

string

Character asset name.

string

Character image URL.

string

JSON string containing generated video URLs. Present when video state is success.

string

Machine-readable failure code. Present when state is failed.

string

Human-readable failure message. Present when state is failed.

integer

Task creation timestamp in Unix milliseconds.

integer

Task completion timestamp in Unix milliseconds. Present after completion.

integer

Processing duration in milliseconds. Present after successful or failed completion when available.

Webhook callback mode

Use callback mode when your backend should receive the final video result automatically instead of polling.

curl -X POST "https://api.apixo.ai/api/v1/generateTask/gemini-omni" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "callback",
    "callback_url": "https://your-server.com/webhooks/apixo",
    "input": {
      "mode": "gemini-omni-video",
      "prompt": "a product reveal shot with smooth camera motion and dramatic lighting",
      "resolution": "720p",
      "duration": "4",
      "aspect_ratio": "16:9"
    }
  }'

See Webhooks for delivery requirements and retry behavior.

Billing

Audio and character asset modes are free asset-generation modes. Video mode is billed by whether a video input is provided and by selected resolution.

Mode	Condition	APIXO price
`gemini-omni-audio`	Per request	`$0.00`
`gemini-omni-character`	Per request	`$0.00`
`gemini-omni-video`	No `video_urls`, `720p` or `1080p`	`$0.10 / second`
`gemini-omni-video`	No `video_urls`, `4k`	`$0.20 / second`
`gemini-omni-video`	With `video_urls`, `720p` or `1080p`	`$1.20 / request`
`gemini-omni-video`	With `video_urls`, `4k`	`$1.80 / request`

Without video input, final video billing is resolution unit price * duration. With video input, duration is ignored and billing is fixed per request. For current route and market comparison pricing, see Pricing.

Latency and polling

Actual latency may vary by prompt complexity, media inputs, selected route, and current queue load.

Workflow	Recommended first poll	Poll interval
Audio asset	Usually returns directly	Optional lookup by `taskId`
Character asset	Usually returns directly	Optional lookup by `taskId`
Video generation	60-90s after task creation	5-10s

For production video workloads, use callback mode to avoid frequent polling.

Audio and character modes share a free asset-generation limit of 1000 requests per user per 60 seconds. Video mode is billed separately and does not consume this free asset limit. Rate limits and concurrency can vary by account, API key, and route. If you receive 429, slow down requests and retry with backoff. For account-level details, see System APIs.

Errors and troubleshooting

HTTP errors

Code	Meaning	What to do
`400`	Invalid request body, missing parameter, unsupported mode, unsupported value, or media quota exceeded	Fix the request before retrying
`401`	Missing or invalid API key	Check the `Authorization` header
`402`	Insufficient balance or quota	Add balance or switch account/key
`403`	Key or route cannot access the model	Check permissions and route strategy
`429`	Rate limit or concurrency limit reached	Retry with exponential backoff
`500`	Server error	Retry with backoff
`502`	Upstream service error	Retry with backoff
`504`	Upstream timeout	Retry or use callback mode for long-running video jobs

Task failure codes

Fail code	Meaning	What to do
`CONTENT_VIOLATION`	Prompt or media input was rejected by safety checks	Change the prompt or input media
`INVALID_INPUT_URL`	A provided input URL cannot be accessed	Use a public, directly reachable URL
`UPSTREAM_ERROR`	Upstream service failed the task	Retry later or contact support with the `taskId`
`UNKNOWN_ERROR`	The failure could not be mapped to a more specific code	Retry later or contact support with the `taskId`

Common validation issues

Issue	Fix
Missing `mode`	Set one of the supported Gemini Omni modes
Missing `voice_key` in audio mode	Choose one official base voice key
Missing `name` in audio mode	Provide a non-empty display name
Missing `image_urls` in character mode	Provide exactly 1 public image URL
Too many `audio_ids`	Use at most 1 audio asset ID for character requests, or at most 3 audio asset IDs for video requests
Multiple `video_urls`	Use at most 1 video URL
Missing `duration` without `video_urls`	Set `duration` to `4`, `6`, `8`, or `10`
`video_start` or `video_end` without `video_urls`	Provide `video_urls` or remove the range fields
Multimodal quota exceeds 7 units	Use fewer image URLs, video URLs, or character IDs

See Error Codes for the full error reference.

Getting Started

Image Models

Video Models

Audio Models

Text & Utility Models

Overview

Workflow

Endpoint and authentication

Copy-paste async quickstart

Poll for result

Request body

Audio asset

Character asset

Video without video input

Video with video input

Parameters

Official base voice options

Response format

Audio asset response

Character asset response

Submit task response

Status response fields

Webhook callback mode

Billing

Latency and polling

Errors and troubleshooting

HTTP errors

Task failure codes

Common validation issues

​Overview

​Workflow

​Endpoint and authentication

​Copy-paste async quickstart

​Poll for result

​Request body

​Audio asset

​Character asset

​Video without video input

​Video with video input

​Parameters

​Official base voice options

​Response format

​Audio asset response

​Character asset response

​Submit task response

​Status response fields

​Webhook callback mode

​Billing

​Latency and polling

​Errors and troubleshooting

​HTTP errors

​Task failure codes

​Common validation issues

​Related links

Overview

Workflow

Endpoint and authentication

Copy-paste async quickstart

Poll for result

Request body

Audio asset

Character asset

Video without video input

Video with video input

Parameters

Official base voice options

Response format

Audio asset response

Character asset response

Submit task response

Status response fields

Webhook callback mode

Billing

Latency and polling

Errors and troubleshooting

HTTP errors

Task failure codes

Common validation issues

Related links