Skip to main content

Overview

CosyVoice 3 Flash is an Alibaba audio model for text-to-speech plus custom voice creation. Use this page when you are ready to call the API after trying the model in the APIXO playground.
CapabilityValue
Model IDcosyvoice-3-flash
Modesspeech, clone, design
Built-in voicesYes
Custom voicesYes
Speech prompt length1-20000 characters
Design prompt length1-500 characters
Preview text length1-200 characters
Clone audio URLsExactly 1 URL

Endpoint and authentication

Base URL:
https://api.apixo.ai/api/v1
MethodEndpointPurpose
POST/generateTask/cosyvoice-3-flashSubmit a speech, clone, or design task
GET/statusTask/cosyvoice-3-flash?taskId={taskId}Poll task status and retrieve results
All requests require your APIXO API key:
Authorization: Bearer YOUR_API_KEY
Submit requests also require:
Content-Type: application/json

Copy-paste async quickstart

This minimal request submits a speech task and returns a taskId.
curl -X POST "https://api.apixo.ai/api/v1/generateTask/cosyvoice-3-flash" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "async",
    "input": {
      "mode": "speech",
      "voice": "longanlang_v3",
      "prompt": "Hello, welcome to APIXO.",
      "format": "mp3"
    }
  }'
Successful response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678"
  }
}
Save the taskId; you need it to poll for the final result.

Poll for result

curl -X GET "https://api.apixo.ai/api/v1/statusTask/cosyvoice-3-flash?taskId=task_12345678" \
  -H "Authorization: Bearer YOUR_API_KEY"
Processing response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "processing",
    "createTime": 1767965610929
  }
}
Success response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "resultJson": "{\"resultUrls\":[\"https://file.apixo.ai/xxx.mp3\"]}",
    "createTime": 1767965610929,
    "completeTime": 1767965622450,
    "costTime": 11521
  }
}
Custom voice success response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "success",
    "voice_id": "cosyvoice-v3-flash-xxxx",
    "voice_status": "OK",
    "resultJson": "{\"resultUrls\":[\"https://file.apixo.ai/preview.wav\"]}",
    "createTime": 1767965610929,
    "completeTime": 1767965682450,
    "costTime": 71521
  }
}
Failed response:
{
  "code": 200,
  "message": "success",
  "data": {
    "taskId": "task_12345678",
    "state": "failed",
    "failCode": "UNDEPLOYED",
    "failMsg": "Voice review failed. This custom voice is unavailable (UNDEPLOYED).",
    "voice_id": "cosyvoice-v3-flash-xxxx",
    "voice_status": "UNDEPLOYED",
    "createTime": 1767965610929,
    "completeTime": 1767965682450
  }
}
Parse resultJson after state becomes success:
const payload = JSON.parse(data.resultJson);
const audioUrls = payload.resultUrls;

Request body

Speech

{
  "request_type": "async",
  "input": {
    "mode": "speech",
    "voice": "longanlang_v3",
    "prompt": "Welcome to APIXO.",
    "format": "mp3",
    "sample_rate": 22050
  }
}

Clone

{
  "request_type": "async",
  "input": {
    "mode": "clone",
    "prefix": "demo01",
    "audio_urls": [
      "https://example.com/reference.wav"
    ]
  }
}

Design

{
  "request_type": "async",
  "input": {
    "mode": "design",
    "prefix": "demo02",
    "prompt": "A warm and clear female narration voice for tutorials.",
    "preview_text": "Hello, this is the preview sample.",
    "format": "wav"
  }
}

Parameters

request_type
string
default:"async"
required
Result delivery mode. Use async for polling with statusTask, or callback for webhook delivery.
callback_url
string
Required when request_type is callback. Must be a public HTTPS URL that can receive the final task payload. See Webhooks.
input
object
required
CosyVoice 3 Flash input parameters.

Response format

Submit task response

POST /generateTask/cosyvoice-3-flash returns a task ID when the task is accepted:
code
integer
API status code. 200 means the task was accepted.
message
string
Human-readable status message.
data.taskId
string
Unique task identifier used with the status endpoint.

Status response fields

taskId
string
Unique task identifier.
state
string
Current task state: pending, processing, success, or failed.
resultJson
string
JSON string containing audio result URLs. Present when audio output is available.
voice_id
string
Custom voice ID returned by clone or design workflows.
voice_status
string
Upstream custom voice status such as DEPLOYING, OK, or UNDEPLOYED.
failCode
string
Machine-readable failure code. Present when state is failed.
failMsg
string
Human-readable failure message. Present when state is failed.
createTime
integer
Task creation timestamp in Unix milliseconds.
completeTime
integer
Task completion timestamp in Unix milliseconds. Present after completion.
costTime
integer
Processing duration in milliseconds. Present after successful completion.

Webhook callback mode

Use callback mode when your backend should receive the final result automatically instead of polling.
curl -X POST "https://api.apixo.ai/api/v1/generateTask/cosyvoice-3-flash" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "request_type": "callback",
    "callback_url": "https://your-server.com/webhooks/apixo",
    "input": {
      "mode": "speech",
      "voice": "longanyang",
      "prompt": "Please welcome everyone to the event."
    }
  }'
See Webhooks for delivery requirements and retry behavior.

Billing

CosyVoice 3 Flash uses different billing units by workflow.
WorkflowAPIXO price
speech$0.10 / 10K characters
clone$0.002 / request
design$0.03 / request
For current route and market comparison pricing, see Pricing.

Latency and polling

Actual latency may vary by text length, voice route, queue load, and whether you are creating a custom voice.
WorkflowTypical generation timeRecommended first pollPoll interval
speech5s-30s5s after task creation3s-5s
clone30s-180s20s after task creation5s-10s
design30s-180s20s after task creation5s-10s
Use callback mode for production voice creation workflows so your backend does not need to poll during longer DEPLOYING periods.
Custom voice enrollment can remain in processing while the upstream voice status is still DEPLOYING. Custom voice_id records are garbage-collected if they are not used for 7 consecutive days. If you plan to keep a custom voice active, call speech with that voice_id at least once within every 7-day window.

Errors and troubleshooting

HTTP errors

CodeMeaningWhat to do
400Invalid request body or parameter shapeFix the request before retrying
401Missing or invalid API keyCheck the Authorization header
402Insufficient balance or quotaAdd balance or switch account/key
403Key or route cannot access the modelCheck permissions and route strategy
429Rate limit or concurrency limit reachedRetry with exponential backoff
500Server errorRetry with backoff
502Upstream provider errorRetry with backoff
504Upstream timeoutRetry or use callback mode

Task failure cases

Fail codeMeaningWhat to do
UNDEPLOYEDVoice review failed and the custom voice is unavailableCreate a new voice with better input audio or a different design prompt
TASK_TIMEOUTCustom voice deployment stayed in DEPLOYING for too longRetry later
VALIDATION_ERRORInput failed APIXO validation, or a voice_id is invalid for the current user/provider/modelFix the input before retrying
UpstreamErrorUpstream route returned an unmapped failureRetry with backoff
Common validation rules:
  • voice is required for speech. APIXO also accepts legacy voice_id in the request and normalizes it into voice.
  • Custom voice_id values must belong to the current user, match the current provider, and match the target model.
  • Custom voice_id values are removed after 7 consecutive days without use.
  • Only cosyvoice-3-flash supports built-in system voices such as longanlang_v3, longanyang, and loongabby_v3.
  • audio_urls must be an array with exactly one non-empty URL.
  • prefix must be 1-10 letters or digits.
  • instruction must be at most 100 characters.