PixVerse V6

social-mediaproduct-adstransitions

PixVerse · Diffusion Transformer · v6.0verifiedVerified

$0.03/sec

starting from, on FAL.ai

Resolution

1080p

Duration

1–15s

Providers

Text-to-VideoImage-to-VideoAudioCameraMulti-ShotLipsyncExtend

API Pricing

FAL.aiV6Cheapest

Try it →

Text-to-Video

$0.025/s

Text-to-VideoAudio

$0.035/s

Text-to-Video

$0.035/s

Text-to-VideoAudio

$0.045/s

Text-to-Video

$0.045/s

Text-to-VideoAudio

$0.060/s

Text-to-Video

$0.090/s

Text-to-VideoAudio

$0.115/s

Image-to-Video

$0.045/s

Image-to-VideoAudio

$0.115/s

Transition

$0.03

Verified 2026-04-10

WaveSpeedV6

Try it →

Text-to-Video

$0.025/s

ExtendAudio

$0.115/s

Verified 2026-04-10

PixVerse Platform APIV6

Try it →

Text-to-VideoAudio

$0.115/s

Lip SyncAudio

—

Verified 2026-04-10

Why PixVerse V6?

thumb_upStrengths

Widest mode variety — text-to-video, image-to-video, transition, extend, and lip-sync all available via API
Granular resolution pricing from $0.025/sec (360p) to $0.115/sec (1080p+audio) — flexible for budget and quality needs
20+ cinematic lens controls (focal length, aperture, DoF, chromatic aberration, vignetting) for precise camera work
Available on 3 providers (FAL.ai, WaveSpeed, PixVerse Platform) with CLI and MCP server support for developer workflows
Physics simulation engine for realistic fluid, material, and light interactions

infoLimitations

Lower AA Arena ranking (#16 T2V, ELO 1,209) compared to top-tier models like HappyHorse or Seedance 2.0
Not open-source or self-deployable — closed weights, API-only access
Lip-sync requires a separate API endpoint rather than integrated single-pass generation
No video-to-video editing capability — cannot restyle or edit existing footage
Extreme camera movements in single shots can produce visual artifacts

auto_fix_highPrompt Guide

1Describe observable elements, not emotions — 'wide tracking shot through pine trees with morning side light, a fox walking left to right' outperforms 'a magical energetic forest scene.'
2Focus on one primary action per clip — competing movements reduce quality. Keep it simple for cleaner, more consistent results.
3Set resolution to 1080p before entering text — ensure configuration matches your target quality before generating.
4Use specific camera language — V6's 20+ lens controls respond to terms like focal length, aperture, depth of field, and chromatic aberration in prompts.
5Leverage multi-shot mode with consistent descriptions — keep character and environment descriptions identical across shots to maintain visual continuity.
6Match aspect ratio to distribution channel — 9:16 for mobile/TikTok, 16:9 for widescreen/YouTube, 1:1 for Instagram feed.

✓ Do this

Structure prompts with: Subject + Action + Environment + Details + Motion + Camera + Style
For transition mode, provide two distinct images and describe the desired movement between them — V6 handles camera angle transitions natively
For extend mode, guide the extension toward the opening frame for seamless loops, or stretch clips to fit different ad slot durations
Use the physics engine for fluid and material shots — viscosity, surface tension, and light interaction are rendered with improved accuracy in V6
For lip-sync, provide clear audio with minimal background noise — the model matches mouth movements to speech with high precision

✗ Avoid this

Extreme camera motions in a single shot can produce artifacts — keep camera movement moderate for best results
Text rendering inside scenes is not reliably supported
Physics simulation is improved but still approximate — complex fluid interactions may not be physically accurate
Not self-deployable — closed source with no model weights available
Lip-sync is a separate endpoint from video generation — requires a two-step workflow

Example Prompts

Nature / Cinematic

“Wide tracking shot through pine trees with morning side light, a fox walking steadily from left to right, leaves rustling on the forest floor. Shallow depth of field, f/2.8 aperture, warm color grading.”

Product / Food

“Close-up of honey slowly dripping from a wooden dipper onto a stack of pancakes. Camera holds steady, macro lens, golden morning light. The viscous flow catches light as it pools. Sound of sizzling butter.”

Corporate / Multi-shot

“A woman in a business suit walks through a modern glass office, turns to camera and speaks. Professional lighting, 16:9, 1080p. [Multi-shot: Shot 1 — tracking following from behind. Shot 2 — medium close-up facing camera.]”

Based on the official prompt guide →

FAQexpand_more

How much does PixVerse V6 cost?

From $0.03/sec on FAL.ai. A 5-second video ≈ $0.13.

Where can I use PixVerse V6?

Via API on FAL.ai and WaveSpeed and PixVerse Platform API.

How do I get good results with PixVerse V6?

Describe observable elements, not emotions — 'wide tracking shot through pine trees with morning side light, a fox walking left to right' outperforms 'a magical energetic forest scene.' See the prompt guide below.