LTX-2 Pro

open-sourceself-hostedBest Value

Lightricks · Diffusion Transformer · v2.0verifiedVerified

$0.06/sec

starting from, on FAL.ai

Resolution

Duration

6–10s

Providers

Text-to-VideoImage-to-VideoAudioLipsync

API Pricing

FAL.aiProCheapest

Try it →

Text-to-VideoAudio

$0.060/s

Text-to-VideoAudio

$0.120/s

Text-to-VideoAudio

$0.240/s

Image-to-VideoAudio

$0.060/s

Image-to-VideoAudio

$0.120/s

Image-to-VideoAudio

$0.240/s

Verified 2026-04-10

Why LTX-2 Pro?

thumb_upStrengths

Fully open source (Apache 2.0) — complete weights, training code, and LoRA trainer on GitHub for self-deployment
Lowest cost with native audio — $0.06/sec at 1080p on FAL.ai includes synchronized audio generation
Native 4K output at 50 fps — highest resolution and frame rate among open-source video models
Joint audio-video generation in a single inference pass — accurate lip sync and high audio fidelity
LoRA fine-tuning support with training completing in under an hour on capable hardware

infoLimitations

Limited to 16:9 aspect ratio — no portrait or square output natively supported
Maximum 10 seconds per generation — requires extension for longer sequences
No camera control panel, motion brush, or multi-shot generation features
Currently available only on FAL.ai for managed API access — fewer provider options than competitors
Text and logo rendering within video is not reliably supported

auto_fix_highPrompt Guide

1Write prompts as a flowing narrative describing a coherent sequence of events unfolding in time — not a list of visual elements or bullet points.
2Include five key components: scene anchor (location, time, atmosphere), subject + action (who/what and a verb), camera + lens (movement, focal length, framing), visual style (color science, grading), and motion/time cues (speed, frame intent).
3Start with close-ups and move outward — the model retains facial and material detail better in tight framing, while wide shots may soften likeness.
4Use concrete nouns and verbs over vague mood words — LTX-2 weighs specific visual and action terms more heavily than abstract atmosphere descriptions.
5Match prompt length to duration — 2-second clips need 2-3 sentences, while 10-second clips benefit from 6-8 sentences of detailed direction.

✓ Do this

For audio-video sync, use cue words like 'on the downbeat,' 'hit on second snare,' or 'cut point at 4s' to align action with generated audio timing
Use 16:9 or 21:9 for wide establishing shots, and 3:4 or 1:1 for close-up portraits
Leverage LoRA fine-tuning for consistent characters, styles, or brand-specific aesthetics — training completes in under an hour on capable hardware
Use the Fast tier ($0.04/sec at 1080p) for rapid iteration and the Pro tier ($0.06/sec) for final deliverables with full audio
Choose 50 fps for smooth slow-motion or high-fidelity motion, and 25 fps for standard cinematic output

✗ Avoid this

Cannot reliably generate readable text or logos within video frames
Overloaded prompts with too many simultaneous elements produce worse results — focus on simpler, directed scenes
Aspect ratio is limited to 16:9 — no portrait (9:16) or square (1:1) native support
No camera control panel or motion brush — all direction is through text prompts
Maximum 10 seconds per generation — longer content requires extension workflows

Example Prompts

Music / Audio-Visual

“A street musician sits on a wooden crate in a narrow European alley at twilight. He strums an acoustic guitar, fingers sliding along the fretboard. The warm golden light from a lamp post casts long shadows. A gentle breeze rustles nearby café curtains. Camera holds steady, medium close-up, shallow depth of field. Guitar melody fills the alley.”

Landscape / Nature

“Aerial establishing shot slowly descending over a Japanese garden in autumn. Crimson maple leaves drift across a still koi pond reflecting the overcast sky. Camera tilts down as it descends, revealing stone lanterns along a gravel path. Soft ambient wind and water sounds.”

Cinematic / Character

“Close-up of a woman's face as she opens her eyes, surprised. Her pupils dilate as warm morning light streams through gauze curtains. Camera rack-focuses from her eyes to the window behind her. A clock ticks softly. Film grain, 35mm aesthetic, warm color grading.”

Based on the official prompt guide →

FAQexpand_more

How much does LTX-2 Pro cost?

From $0.06/sec on FAL.ai. A 5-second video ≈ $0.30.

Where can I use LTX-2 Pro?

Via API on FAL.ai.

How do I get good results with LTX-2 Pro?

Write prompts as a flowing narrative describing a coherent sequence of events unfolding in time — not a list of visual elements or bullet points. See the prompt guide below.