Mochi 1

researchprototypingopen-source

Genmo · Asymmetric Diffusion Transformer (AsymmDiT) · v1.0verifiedVerified

—/sec

starting from, on FAL.ai

Resolution

480p

Duration

5–5s

Providers

Text-to-Video

API Pricing

FAL.aiMochi 1Cheapest

Try it →

Text-to-Video

$0.40

Verified 2026-04-10

ReplicateMochi 1

Try it →

Text-to-Video

$0.42

Verified 2026-04-10

Why Mochi 1?

thumb_upStrengths

Fully open source under Apache 2.0 — free to use, modify, and deploy commercially
Exceptional photorealistic motion quality — realistic physics for fluid, fur, hair, and human movement
Strong prompt adherence — closely follows user instructions when prompts are clear
10B parameters make it the largest open-source video model, enabling fine-tuning for custom domains
128x latent compression (8x8 spatial, 6x temporal) enables efficient training and inference

infoLimitations

480p output only — significantly lower resolution than commercial competitors
Fixed ~5.4 second duration with no flexibility for shorter or longer clips
Text-to-video only — no image-to-video, video-to-video, or audio generation
Requires substantial GPU resources (4x H100) for self-hosting without optimization
ELO of 1,000 places it at the median — outperformed by most commercial models on quality

auto_fix_highPrompt Guide

1Anchor the scene by specifying subject, setting, time of day, and camera — Mochi excels when all four elements are explicit.
2Define motion and pacing with terms like 'slow pan left,' 'handheld jitter,' 'smooth crane up' — the model simulates realistic physics from clear motion cues.
3State composition and lens details: 'wide establishing shot,' 'close-up portrait,' 'macro product hero,' 'anamorphic flare, 35mm, f/2.8.'
4Avoid contradictions — keep style and motion cues consistent; overly mixed metaphors (anime + Pixar + watercolor + photoreal) introduce artifacts.
5Iterate lightly with small prompt edits — small changes often yield big improvements; avoid rewriting everything at once.
6Photorealism is the sweet spot — stylized or animated looks may require fine-tuning or more careful prompt engineering.

✓ Do this

Use prompt expansion (enabled by default on FAL.ai) to let the model augment your description for better results
Use negative prompts to exclude unwanted elements: 'blurry, low quality, distorted, watermark'
Set a seed value for reproducibility when iterating on prompt variations
Keep prompts clear and concise — Mochi's strong prompt adherence rewards specificity over length
Focus on single continuous actions rather than complex multi-step sequences within one clip

✗ Avoid this

Output capped at 480p — not suitable for production-quality final output without upscaling
Fixed ~5.4 second duration with no option for shorter or longer clips
Text-to-video only — no image-to-video or video-to-video modes
Does not perform well with animated or stylized content — optimized for photorealism
Extreme motion scenarios may produce minor warping and distortions
Requires 4x H100 GPUs (~60GB VRAM) for self-deployment, though community optimizations reduce this to ~20GB

Example Prompts

Nature / Macro

“A slow-motion close-up of a hummingbird hovering near a bright red flower, wings beating rapidly. Shallow depth of field, morning dew on the petals, soft bokeh background. Natural daylight, macro lens, 85mm f/1.4.”

Cinematic / Urban

“A woman with curly hair walks through a rainy city street at twilight, holding a transparent umbrella. Neon signs reflect in puddles on the asphalt. Handheld camera, cinematic color grading, moody atmosphere.”

Landscape / Nature

“An overhead shot of ocean waves crashing against dark volcanic rocks. White foam swirls around the stone. Slow, hypnotic rhythm. Drone camera, wide angle, cool blue color palette.”

Based on the official prompt guide →

FAQexpand_more

Where can I use Mochi 1?

Via API on FAL.ai and Replicate.

How do I get good results with Mochi 1?

Anchor the scene by specifying subject, setting, time of day, and camera — Mochi excels when all four elements are explicit. See the prompt guide below.