Model catalog
Latest and popular models
Start from the models people try first, then search and filter the full catalog by output type or workflow.
Output
Workflow
59 models
Showing the full catalog

bytedance
Seedance 2.0
Seedance 2.0 is ByteDance's multimodal video model supporting text-to-video, first-and-last-frames, and omni-reference modes. APIXO exclusive: unlimited concurrency, real-person portrait support, and hidden capabilities.

bytedance
Seedance 2.0 Fast
Seedance 2.0 Fast is the speed-optimized variant of ByteDance's multimodal video model. It supports text-to-video, first-and-last-frames, and omni-reference modes with lower per-second pricing and the same APIXO-exclusive capabilities.

Alibaba
HappyHorse
HappyHorse is Alibaba's video generation and editing model for text-to-video, image-to-video, reference-guided generation, and video-edit workflows with 720p/1080p output.

OpenAI
GPT-Image-2
GPT-Image-2 is OpenAI's next-generation image model for stronger photorealism, cleaner image editing, and sharper in-image text rendering.

Black Forest Labs
Flux 2
BFL’s latest Pro & Flex pipelines for text-to-image and image-to-image with unified 1K/2K pricing and ~30s generation.

Nano Banana Pro
Nano Banana Pro (Gemini 3 Pro Image) is Google's AGI-level image generation model with reasoning capabilities, native 4K output, Search Grounding for real-time data integration, near-perfect text rendering, and superior spatial awareness.

midjourney
Midjourney
Midjourney is an advanced AI image generation model known for its artistic and high-quality outputs. It excels at creating stylized, creative images with exceptional detail and aesthetic appeal. Supports both text-to-image and image-to-image generation.

Black Forest Labs
Flux Kontext
Professional-grade image generation with enhanced prompt understanding and superior quality output.

OpenAI
GPT-Image-1
GPT-Image-1 is OpenAI's advanced multimodal model for high-quality image generation with natural language understanding.

Nano Banana
Gemini 2.5 Flash Image Preview (aka Nano Banana) is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.

Alibaba
Wan 2.7
Wan 2.7 is Alibaba's video generation and editing model for text-to-video, image-to-video, reference-guided generation, and video-edit workflows with optional audio input and 720p/1080p output.

Alibaba
Wan 2.6
Wan 2.6 is Alibaba's multi-mode video generation model for text, image, flash image, reference, and flash reference workflows, with optional audio input and 720p/1080p output.

Veo 3.1 Extend
Veo 3.1 Extend continues existing Veo 3.1 tasks with new prompts and mode selection (fast or quality), enabling iterative video continuation workflows from internal task IDs.
seedream
Seedream 5.0
Seedream 5.0 is ByteDance's next-generation AI image model with real-time web search, controllable editing, and logical reasoning. It supports text-to-image and image-to-image with 2K/3K resolution, multiple aspect ratios, and up to 14 reference images.

OpenAI
Sora 2 Pro
Sora 2 Pro is OpenAI’s premium video generation model with higher quality output, supporting text-to-video and image-to-video at 720p and 1080p resolutions with flexible durations of 10 or 15 seconds.
Gemini 3 Pro
Gemini 3 Pro is Google's flagship multimodal reasoning model built for long-context chat, tool calling, and structured outputs. It accepts text and media inputs and returns high-quality text responses for production assistants and analytics workflows.
bytedance
Seedance 1.5 Pro
Seedance 1.5 Pro is ByteDance's per-second video model for fast text-to-video and image-to-video generation with 480p/720p output, optional sound, aspect ratio control, and fixed-lens camera stability.
hailuo
Hailuo 2.3
Hailuo 2.3 is Miniax's async video model with standard and pro modes for text-to-video and image-to-video generation. Standard mode supports 6s/10s at 768p, while pro mode returns fixed 5s at 1080p.
hailuo
Hailuo 2.3 Fast
Hailuo 2.3 Fast is Miniax's speed-optimized image-to-video model with standard and pro modes. Standard supports 6s/10s at 768p, while pro returns fixed 6s output at 1080p.
xai
Grok Image
Grok Image is xAI's image generation model for text-to-image and image-to-image workflows with simple aspect-ratio control and async task delivery.
Alibaba
Wan 2.2 Animate
Wan 2.2 Animate API is Alibaba's character animation model that combines one source image and one motion video to generate stylized animated outputs with animate/replace behavior.
xai
Grok Video
Grok Video is xAI's async video generation model for text-to-video and image-to-video workflows, with optional continuation via task_id + index and style control.
kling
Kling 3.0 Std
Kling 3.0 Std is Kuaishou's standard-quality video generation model with text-to-video, image-to-video, and motion-control modes. It supports clips up to 15 seconds with optional sound generation and flexible aspect ratios.
wavespeed
InfiniteTalk
InfiniteTalk converts one photo plus audio into audio-driven talking or singing avatar videos with precise lip synchronization. Supports up to 10 minutes at 480p or 720p resolution.
Nano Banana 2
Nano Banana 2 is Google’s high-resolution image generation model with 1K/2K/4K output control, 20,000-character prompts, optional Google Search context, and support for up to 14 reference images.
vidu
Vidu Q3
Vidu Q3 is a per-second video generation model that combines standard and Turbo text-to-video plus image-to-video workflows in one API. It supports single-image animation, first-and-last-frame transitions, optional sound and BGM, and output up to 1080p.
kling
Kling 2.5 Turbo Pro
Kling 2.5 Turbo Pro is Kuaishou's high-speed video model for text-to-video and image-to-video creation. It supports 5-10 second clips, optional tail-frame images, aspect ratio control for text-to-video, plus negative prompts and CFG scale guidance.
Lightricks
LTX-2 19B
LTX-2 19B is Lightricks' open-source 19B diffusion transformer for cinematic video generation. It supports text-to-video and image-to-video workflows, LoRA conditioning, and high-fidelity outputs up to 1080p in the API.
kling
Kling 2.1
Kling 2.1 is Kuaishou's multi-tier video model with Standard, Pro, and Master modes for image-to-video and text-to-video creation. It supports 5-10 second clips, optional tail images for Pro, and aspect ratio control for Master text-to-video.
kling
Kling 2.6
Kling 2.6 is Kuaishou's native audio-visual video model that generates video, speech, sound effects, and ambience in one pass. It supports text-to-audio-visual and image-to-audio-visual creation with Chinese and English voice generation and up to 10-second clips.

suno
Suno V5
Latest Suno text-to-music model that returns two polished songs per call with faster queues and richer vocals.
Claude
Claude Opus 4.7
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Opus 4.7 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Opus 4.6
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Opus 4.6 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Opus 4.5 20251101
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Opus 4.5 20251101 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Sonnet 4.6
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Sonnet 4.6 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Sonnet 4.5 20250929
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Sonnet 4.5 20250929 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Haiku 4.5 20251001
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Claude
Claude Haiku 4.5 20251001 Thinking
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.4
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.4 Pro
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.4 Mini
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.4 Nano
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.2
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
OpenAI
GPT-5.1
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 3.1 Pro Preview
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 3.1 Flash Lite Preview
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 3 Flash Preview
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 2.5 Pro
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 2.5 Flash
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.
Gemini
Gemini 2.5 Flash Lite
OpenAI-compatible gateway for chat, agents, reasoning, and structured generation.

Alibaba
Wan 2.5
Wan 2.5 is Alibaba's video generation model for text-to-video and image-to-video workflows, with optional audio input, 480p/720p/1080p output, 5 or 10 second clips, and prompt expansion.

Veo 3.1
Google DeepMind’s upgraded AI video model for realistic motion generation, extended clip duration, multi-image reference control, and synchronized audio output in native 1080p.

OpenAI
Sora 2
Sora 2 is OpenAI’s latest AI video generation model, supporting both text-to-video and image-to-video. It delivers realistic motion, physics consistency, with improved control over style, scene, and aspect ratio—ideal for creative apps and social media content.
seedream
Seedream 4.5
Seedream 4.5 is a powerful text-to-image and image-to-image AI model delivering high-quality image generation with support for 2K and 4K resolutions.