HunyuanVideo 1.5
prototypingBest Valueself-hostedTencent · Diffusion Transformer · v1.5verifiedVerified
$0.02/sec
starting from, on WaveSpeed
Resolution
1080p
Duration
5–10s
Providers
2
API Pricing
Why HunyuanVideo 1.5?
thumb_upStrengths
- Runs on consumer GPUs with just 14GB VRAM — the most hardware-accessible open-source video model
- Open source with full weights, training code, and LoRA fine-tuning pipeline on GitHub and Hugging Face
- SSTA attention mechanism delivers nearly 2x inference speedup over standard FlashAttention
- Extremely low-cost on WaveSpeed at $0.02/sec (480p) — the cheapest API option for video generation
- Strong motion coherence and prompt adherence despite compact 8.3B parameter size
infoLimitations
- Native output is only 480p — requires super-resolution for higher resolution output
- Short maximum duration (~5 seconds) compared to models offering 10-15 second clips
- No native audio generation, lip-sync, or dialogue capabilities
- Lower overall quality ranking (ELO 1,014) compared to premium models
- Limited aspect ratio support — only 16:9 and 9:16, no square (1:1) option
auto_fix_highPrompt Guide
- 1Keep prompts concise and focused on your core idea — less is more with HunyuanVideo. Overloading with excessive detail prevents the model from producing coherent results.
- 2Use natural language descriptions rather than technical jargon — write prompts like 'sunset over ocean waves' rather than complex comma-separated keyword lists.
- 3Include specific sensory details to add texture — 'sunlight glistening on wet pavement after rain' creates richer output than generic scene descriptions.
- 4Stick to one or two style cues when defining tone and aesthetics — conflicting style terms (e.g., 'photorealistic anime noir') confuse the model.
- 5For in-video text generation, enclose the desired text in quotation marks within your prompt — HunyuanVideo 1.5 can render clear text within video frames.
✓ Do this
- Structure prompts with four components: Subject (main focus), Setting (environment), Action (movement and change), and Style (camera, lighting, mood)
- Enable prompt expansion for automatic enhancement — it refines raw prompts for better semantic understanding and output quality
- Iterate and refine — review the first output and adjust your description for clarity or additional detail rather than starting from scratch
- Use the lightweight nature for rapid prototyping at 480p, then upscale final selections with the built-in super-resolution module
- Leverage the open-source model for LoRA fine-tuning on consumer GPUs (14GB VRAM minimum) for custom characters or brand styles
✗ Avoid this
- Native output resolution is 480p — higher resolutions require the super-resolution upscaling module
- Maximum duration is approximately 5 seconds (121 frames) on FAL.ai — shorter than many competitors
- No native audio generation — output is video-only
- No camera control panel, motion brush, or multi-shot generation
- Complex multi-element prompts with many simultaneous actions may produce inconsistent results
Example Prompts
“A golden retriever runs through a field of wildflowers on a sunny afternoon. The dog leaps joyfully, ears flapping in the wind. Shallow depth of field, warm golden light, slow motion feel.”
“Close-up of rain droplets falling onto a still pond, creating expanding circular ripples. Each drop catches a glint of overcast sky. Macro lens perspective, muted cool tones, meditative pace.”
“A calligrapher writes the character 'Dream' with a brush on rice paper. Ink flows smoothly from the brush tip, each stroke deliberate and confident. Top-down camera angle, warm desk lamp lighting, shallow depth of field on the brush.”
Based on the official prompt guide →
FAQexpand_more
How much does HunyuanVideo 1.5 cost?
From $0.02/sec on WaveSpeed. A 5-second video ≈ $0.10.
Where can I use HunyuanVideo 1.5?
Via API on FAL.ai and WaveSpeed.
How do I get good results with HunyuanVideo 1.5?
Keep prompts concise and focused on your core idea — less is more with HunyuanVideo. Overloading with excessive detail prevents the model from producing coherent results. See the prompt guide below.