Wan 2.7
open-sourceyoutubecommercialAlibaba · Mixture-of-Experts Diffusion Transformer · v2.7verifiedVerified
$0.10/sec
starting from, on FAL.ai
Resolution
1080p
Duration
2–15s
Providers
2
API Pricing
Why Wan 2.7?
thumb_upStrengths
- Four modes in one architecture — t2v, i2v with frame anchoring, reference-to-video, and instruction editing
- 27B MoE parameters (14B active) delivers premium quality at competitive $0.10/sec pricing
- Open source under Apache 2.0 — free commercial use with no royalties or attribution required
- First/last-frame control enables precise scene transitions unavailable in most competitors
- Native audio with voice cloning and lip-sync in reference-to-video mode
- 1080p output at 30fps with 5 aspect ratios covers most production needs
infoLimitations
- Open weights not yet available at launch — self-deployment delayed to mid-Q2 2026
- 27B parameters make self-hosting extremely resource-intensive when weights ship
- AA Arena ELO of 1,186 (under Wan 2.6) may not fully reflect 2.7 improvements yet
- No 4K output — maxes at 1080p while some competitors offer 4K
- Newer model with less community tooling and fine-tuning ecosystem than established alternatives
auto_fix_highPrompt Guide
- 1Structure prompts by element — describe subject, style, lighting, and composition as distinct descriptors rather than a single run-on sentence.
- 2Specify camera directions explicitly: 'camera follows,' 'smooth pan left,' 'close-up,' 'aerial descending,' 'dolly zoom' — Wan 2.7 interprets complex camera blocking.
- 3Include style keywords like 'cinematic,' 'cartoon,' 'realistic,' 'anime' to guide the model's visual treatment.
- 4For first/last-frame control in image-to-video, describe the progression between the two anchor frames rather than restating what's in each image.
- 5For instruction-based editing, be specific: 'change the jacket from red to navy' works better than 'make it look different.'
- 6Leverage the 9-grid multi-image input for reference-to-video to maintain character consistency across multiple subjects.
✓ Do this
- Follow the pattern: Who/What + Where + Movement/Activity + Camera + Style for text-to-video
- Use endpoint anchors (first and last frame) for precise scene transitions in image-to-video mode
- For multi-character scenes, use multi-reference grids with distinct visual identities per character
- Keep environmental context rich: lighting conditions, time of day, weather, atmosphere
- Use the edit-video mode for iterative refinement — cheaper than regenerating entire clips
✗ Avoid this
- Open weights expected mid-to-late Q2 2026 — self-deployment not yet available at launch
- No multi-shot generation in a single call — clips must be stitched manually
- Complex multi-step sequences within a single prompt may lose coherence beyond 10 seconds
- Text rendering in video remains unreliable
- Very high parameter count (27B total) makes self-deployment resource-intensive when weights ship
Example Prompts
“A bustling Tokyo street at night with neon lights reflecting on wet pavement. A young woman in a leather jacket walks toward the camera. Slow tracking shot, cinematic anamorphic lens flare, moody cyberpunk atmosphere. Rain drizzle, 1080p, 16:9.”
“A chef in a professional kitchen flambeing a pan of shrimp. Dramatic orange flames rise briefly. Close-up from the side, warm tungsten lighting, shallow depth of field. The chef's confident expression visible through the flames.”
“[Edit mode] Change the sky from overcast gray to a vivid sunset with deep orange and purple gradients. Keep all foreground elements — the lighthouse, the rocky coastline, and the waves — unchanged.”
Based on the official prompt guide →
FAQexpand_more
How much does Wan 2.7 cost?
From $0.10/sec on FAL.ai. A 5-second video ≈ $0.50.
Where can I use Wan 2.7?
Via API on FAL.ai and Segmind.
How do I get good results with Wan 2.7?
Structure prompts by element — describe subject, style, lighting, and composition as distinct descriptors rather than a single run-on sentence. See the prompt guide below.