Wan 2.7

open-sourceyoutubecommercial

Alibaba · Mixture-of-Experts Diffusion Transformer · v2.7verifiedVerified

$0.10/sec

starting from, on FAL.ai

Resolution

1080p

Duration

2–15s

Providers

2

Text-to-VideoImage-to-VideoAudioCameraLipsyncV2V

API Pricing

FAL.aiWan 2.7Cheapest
Try it →
Text-to-Video
$0.100/s
Image-to-Video
$0.100/s
Reference-to-Video
$0.100/s
Edit-Video
$0.100/s
Verified 2026-04-10
SegmindWan 2.7
Try it →
Text-to-Video
$0.63
Text-to-Video
$0.94
Verified 2026-04-10

Why Wan 2.7?

thumb_upStrengths

  • Four modes in one architecture — t2v, i2v with frame anchoring, reference-to-video, and instruction editing
  • 27B MoE parameters (14B active) delivers premium quality at competitive $0.10/sec pricing
  • Open source under Apache 2.0 — free commercial use with no royalties or attribution required
  • First/last-frame control enables precise scene transitions unavailable in most competitors
  • Native audio with voice cloning and lip-sync in reference-to-video mode
  • 1080p output at 30fps with 5 aspect ratios covers most production needs

infoLimitations

  • Open weights not yet available at launch — self-deployment delayed to mid-Q2 2026
  • 27B parameters make self-hosting extremely resource-intensive when weights ship
  • AA Arena ELO of 1,186 (under Wan 2.6) may not fully reflect 2.7 improvements yet
  • No 4K output — maxes at 1080p while some competitors offer 4K
  • Newer model with less community tooling and fine-tuning ecosystem than established alternatives

auto_fix_highPrompt Guide

  1. 1Structure prompts by element — describe subject, style, lighting, and composition as distinct descriptors rather than a single run-on sentence.
  2. 2Specify camera directions explicitly: 'camera follows,' 'smooth pan left,' 'close-up,' 'aerial descending,' 'dolly zoom' — Wan 2.7 interprets complex camera blocking.
  3. 3Include style keywords like 'cinematic,' 'cartoon,' 'realistic,' 'anime' to guide the model's visual treatment.
  4. 4For first/last-frame control in image-to-video, describe the progression between the two anchor frames rather than restating what's in each image.
  5. 5For instruction-based editing, be specific: 'change the jacket from red to navy' works better than 'make it look different.'
  6. 6Leverage the 9-grid multi-image input for reference-to-video to maintain character consistency across multiple subjects.

✓ Do this

  • Follow the pattern: Who/What + Where + Movement/Activity + Camera + Style for text-to-video
  • Use endpoint anchors (first and last frame) for precise scene transitions in image-to-video mode
  • For multi-character scenes, use multi-reference grids with distinct visual identities per character
  • Keep environmental context rich: lighting conditions, time of day, weather, atmosphere
  • Use the edit-video mode for iterative refinement — cheaper than regenerating entire clips

✗ Avoid this

  • Open weights expected mid-to-late Q2 2026 — self-deployment not yet available at launch
  • No multi-shot generation in a single call — clips must be stitched manually
  • Complex multi-step sequences within a single prompt may lose coherence beyond 10 seconds
  • Text rendering in video remains unreliable
  • Very high parameter count (27B total) makes self-deployment resource-intensive when weights ship

Example Prompts

Cinematic / Urban

A bustling Tokyo street at night with neon lights reflecting on wet pavement. A young woman in a leather jacket walks toward the camera. Slow tracking shot, cinematic anamorphic lens flare, moody cyberpunk atmosphere. Rain drizzle, 1080p, 16:9.

Product / Food

A chef in a professional kitchen flambeing a pan of shrimp. Dramatic orange flames rise briefly. Close-up from the side, warm tungsten lighting, shallow depth of field. The chef's confident expression visible through the flames.

Video Editing

[Edit mode] Change the sky from overcast gray to a vivid sunset with deep orange and purple gradients. Keep all foreground elements — the lighthouse, the rocky coastline, and the waves — unchanged.

Based on the official prompt guide →

FAQexpand_more

How much does Wan 2.7 cost?

From $0.10/sec on FAL.ai. A 5-second video ≈ $0.50.

Where can I use Wan 2.7?

Via API on FAL.ai and Segmind.

How do I get good results with Wan 2.7?

Structure prompts by element — describe subject, style, lighting, and composition as distinct descriptors rather than a single run-on sentence. See the prompt guide below.