GPT-Image-2 API is now live

View Model
Video

HappyHorse

HappyHorse API is Alibaba's video generation and editing model for text-to-video, image-to-video, reference-guided generation, and video editing. Use the HappyHorse API for cinematic short-form clips, ecommerce ads, digital-human presenter videos, and social creative production.

$0.125 – $0.225 / s

HappyHorse 720p / 1080pHappyHorse 3–15 secondsHappyHorse text, image, reference, editHappyHorse native audio output
Model Type

Parameters

Disabled

Output

Generated content will appear here

Example outputs

Generated with the HappyHorse API on APIXO

HappyHorse text-to-video drama scene

View
First frame

HappyHorse image-to-video action clip

View

Complete Guide

What is the HappyHorse API?

HappyHorse 1.0 is Alibaba's video generation and editing model for creating short cinematic clips from text, a first-frame image, multiple visual references, or an existing source video.

The HappyHorse API is especially positioned around short-drama production, ecommerce advertising, digital-human presenter content, and social creative videos where high-quality motion needs to be produced quickly from compact inputs.

Choose the right HappyHorse API mode

  • HappyHorse text-to-video: start from a prompt when you have an idea but no visual material, or when you need to test a creative direction quickly.
  • HappyHorse image-to-video: upload one image as the first frame when the opening composition, pose, color, or product layout must stay precise.
  • HappyHorse reference-to-video: upload 1–9 images when the model should refer to subject appearance, style, or scene information without locking the first frame.
  • HappyHorse video-edit: transform one existing video with an edit prompt and optional reference images.

HappyHorse prompt structure for better results

A HappyHorse prompt is the positive instruction that describes what appears in the video and how it moves. More accurate and richer prompts usually improve HappyHorse output quality.

For HappyHorse API requests, a practical structure is scene + subject + motion + audio. The scene describes the environment around the subject, including background and foreground, whether realistic or imagined.

Recommended Prompt Formula

Use subject + action + scene + camera + style for visual control, then add audio cues when sound matters.

HappyHorse API pricing logic before production

HappyHorse API usage is billed per second with separate 720p and 1080p rates. HappyHorse text-to-video, image-to-video, and reference-to-video multiply output duration by the selected resolution rate.

Video-Edit Billing

Video-edit uses min(input video duration, 15) × 2 × rate, because the public duration parameter is ignored for that mode.

HappyHorse API workflow for production teams

A reliable HappyHorse API workflow starts by choosing the smallest input that gives the model enough control: text when the idea is flexible, one first-frame image when composition is fixed, reference images when identity or style matters, and video-edit when source footage already exists.

For production, HappyHorse API requests should store the prompt, mode, resolution, input URLs, taskId, and final result URL together. This makes HappyHorse outputs easier to audit, reproduce, compare, and reuse across campaigns.

  • Use HappyHorse text-to-video for concept exploration and fast creative direction tests.
  • Use HappyHorse image-to-video when an existing product image, poster, or AI still needs controlled motion.
  • Use HappyHorse reference-to-video when character appearance, scene style, or brand visual language must remain consistent.
  • Use HappyHorse video-edit when a source clip needs a style, scene, or action transformation.
  • Run HappyHorse API jobs asynchronously and poll after the initial wait instead of querying immediately.
  • Keep HappyHorse prompts specific about scene, subject, movement, camera, and audio so the generated result matches production intent.

HappyHorse API technical specs

Current HappyHorse API surface for production integration through APIXO.

HappyHorse Modes

Text, image, reference, video edit

HappyHorse Image References

1 image or 1–9 references

HappyHorse Resolution

720p and 1080p

HappyHorse Workflow

Async task with polling or callback

Production Scenarios

Where HappyHorse performs best

HappyHorse short-drama detail

HappyHorse performs well in short-drama production where plot detail, lighting atmosphere, and character consistency need to hold across a compact scene.

HappyHorse ecommerce ad production

The HappyHorse API supports natural digital-human presenter videos and high instruction following, making it useful for batch production of merchant ad materials.

HappyHorse social creative iteration

HappyHorse can quickly produce polished, tightly paced clips that adapt to multiple social styles, with especially useful image-to-video behavior for lowering production effort.

What can you build?

HappyHorse film and short-drama scenes

Use HappyHorse to generate story beats with controlled atmosphere, character action, and cinematic framing for concept validation or short-form narrative production.

HappyHorse product and ecommerce ads

Turn product stills, campaign prompts, and presenter concepts into HappyHorse motion assets for listings, landing pages, and paid social.

HappyHorse digital-human presenter clips

Create HappyHorse presenter-style videos where script direction, gesture, and brand tone need to stay aligned with the prompt.

HappyHorse social creative videos

Produce visually polished HappyHorse clips for short-video platforms, from surreal ideas to realistic lifestyle scenes and fast meme-driven concepts.

Notes & limitations

  • HappyHorse API prompt is required for text-to-video, reference-to-video, and video-edit, and can be up to 2500 characters.
  • HappyHorse image-to-video requires exactly one image. The image locks the first frame, so crop it to the target composition before upload.
  • HappyHorse reference-to-video requires 1–9 reference images and uses those images as visual guidance, not as a fixed first frame.
  • HappyHorse video-edit requires exactly one input video and can accept 0–5 optional reference images.
  • HappyHorse video-edit does not expose public output duration; billing is based on min(input video duration, 15) × 2 × the selected resolution rate.
  • HappyHorse output includes audio by default, and inference audio cannot be fully disabled.
  • For long-running tasks, callback mode is recommended over frequent polling.

Frequently asked questions

Start building

Try the playground above, then move to the API docs when you're ready to integrate.