LTX-2 Pro
open-sourceself-hostedBest ValueLightricks · Diffusion Transformer · v2.0verifiedVerified
$0.06/sec
starting from, on FAL.ai
Resolution
4K
Duration
6–10s
Providers
1
API Pricing
Why LTX-2 Pro?
thumb_upStrengths
- Fully open source (Apache 2.0) — complete weights, training code, and LoRA trainer on GitHub for self-deployment
- Lowest cost with native audio — $0.06/sec at 1080p on FAL.ai includes synchronized audio generation
- Native 4K output at 50 fps — highest resolution and frame rate among open-source video models
- Joint audio-video generation in a single inference pass — accurate lip sync and high audio fidelity
- LoRA fine-tuning support with training completing in under an hour on capable hardware
infoLimitations
- Limited to 16:9 aspect ratio — no portrait or square output natively supported
- Maximum 10 seconds per generation — requires extension for longer sequences
- No camera control panel, motion brush, or multi-shot generation features
- Currently available only on FAL.ai for managed API access — fewer provider options than competitors
- Text and logo rendering within video is not reliably supported
auto_fix_highPrompt Guide
- 1Write prompts as a flowing narrative describing a coherent sequence of events unfolding in time — not a list of visual elements or bullet points.
- 2Include five key components: scene anchor (location, time, atmosphere), subject + action (who/what and a verb), camera + lens (movement, focal length, framing), visual style (color science, grading), and motion/time cues (speed, frame intent).
- 3Start with close-ups and move outward — the model retains facial and material detail better in tight framing, while wide shots may soften likeness.
- 4Use concrete nouns and verbs over vague mood words — LTX-2 weighs specific visual and action terms more heavily than abstract atmosphere descriptions.
- 5Match prompt length to duration — 2-second clips need 2-3 sentences, while 10-second clips benefit from 6-8 sentences of detailed direction.
✓ Do this
- For audio-video sync, use cue words like 'on the downbeat,' 'hit on second snare,' or 'cut point at 4s' to align action with generated audio timing
- Use 16:9 or 21:9 for wide establishing shots, and 3:4 or 1:1 for close-up portraits
- Leverage LoRA fine-tuning for consistent characters, styles, or brand-specific aesthetics — training completes in under an hour on capable hardware
- Use the Fast tier ($0.04/sec at 1080p) for rapid iteration and the Pro tier ($0.06/sec) for final deliverables with full audio
- Choose 50 fps for smooth slow-motion or high-fidelity motion, and 25 fps for standard cinematic output
✗ Avoid this
- Cannot reliably generate readable text or logos within video frames
- Overloaded prompts with too many simultaneous elements produce worse results — focus on simpler, directed scenes
- Aspect ratio is limited to 16:9 — no portrait (9:16) or square (1:1) native support
- No camera control panel or motion brush — all direction is through text prompts
- Maximum 10 seconds per generation — longer content requires extension workflows
Example Prompts
“A street musician sits on a wooden crate in a narrow European alley at twilight. He strums an acoustic guitar, fingers sliding along the fretboard. The warm golden light from a lamp post casts long shadows. A gentle breeze rustles nearby café curtains. Camera holds steady, medium close-up, shallow depth of field. Guitar melody fills the alley.”
“Aerial establishing shot slowly descending over a Japanese garden in autumn. Crimson maple leaves drift across a still koi pond reflecting the overcast sky. Camera tilts down as it descends, revealing stone lanterns along a gravel path. Soft ambient wind and water sounds.”
“Close-up of a woman's face as she opens her eyes, surprised. Her pupils dilate as warm morning light streams through gauze curtains. Camera rack-focuses from her eyes to the window behind her. A clock ticks softly. Film grain, 35mm aesthetic, warm color grading.”
Based on the official prompt guide →
FAQexpand_more
How much does LTX-2 Pro cost?
From $0.06/sec on FAL.ai. A 5-second video ≈ $0.30.
Where can I use LTX-2 Pro?
Via API on FAL.ai.
How do I get good results with LTX-2 Pro?
Write prompts as a flowing narrative describing a coherent sequence of events unfolding in time — not a list of visual elements or bullet points. See the prompt guide below.