Mochi 1

researchprototypingopen-source

Genmo · Asymmetric Diffusion Transformer (AsymmDiT) · v1.0verifiedVerified

/sec

starting from, on FAL.ai

Resolution

480p

Duration

5–5s

Providers

2

Text-to-Video

API Pricing

FAL.aiMochi 1Cheapest
Try it →
Text-to-Video
$0.40
Verified 2026-04-10
ReplicateMochi 1
Try it →
Text-to-Video
$0.42
Verified 2026-04-10

Why Mochi 1?

thumb_upStrengths

  • Fully open source under Apache 2.0 — free to use, modify, and deploy commercially
  • Exceptional photorealistic motion quality — realistic physics for fluid, fur, hair, and human movement
  • Strong prompt adherence — closely follows user instructions when prompts are clear
  • 10B parameters make it the largest open-source video model, enabling fine-tuning for custom domains
  • 128x latent compression (8x8 spatial, 6x temporal) enables efficient training and inference

infoLimitations

  • 480p output only — significantly lower resolution than commercial competitors
  • Fixed ~5.4 second duration with no flexibility for shorter or longer clips
  • Text-to-video only — no image-to-video, video-to-video, or audio generation
  • Requires substantial GPU resources (4x H100) for self-hosting without optimization
  • ELO of 1,000 places it at the median — outperformed by most commercial models on quality

auto_fix_highPrompt Guide

  1. 1Anchor the scene by specifying subject, setting, time of day, and camera — Mochi excels when all four elements are explicit.
  2. 2Define motion and pacing with terms like 'slow pan left,' 'handheld jitter,' 'smooth crane up' — the model simulates realistic physics from clear motion cues.
  3. 3State composition and lens details: 'wide establishing shot,' 'close-up portrait,' 'macro product hero,' 'anamorphic flare, 35mm, f/2.8.'
  4. 4Avoid contradictions — keep style and motion cues consistent; overly mixed metaphors (anime + Pixar + watercolor + photoreal) introduce artifacts.
  5. 5Iterate lightly with small prompt edits — small changes often yield big improvements; avoid rewriting everything at once.
  6. 6Photorealism is the sweet spot — stylized or animated looks may require fine-tuning or more careful prompt engineering.

✓ Do this

  • Use prompt expansion (enabled by default on FAL.ai) to let the model augment your description for better results
  • Use negative prompts to exclude unwanted elements: 'blurry, low quality, distorted, watermark'
  • Set a seed value for reproducibility when iterating on prompt variations
  • Keep prompts clear and concise — Mochi's strong prompt adherence rewards specificity over length
  • Focus on single continuous actions rather than complex multi-step sequences within one clip

✗ Avoid this

  • Output capped at 480p — not suitable for production-quality final output without upscaling
  • Fixed ~5.4 second duration with no option for shorter or longer clips
  • Text-to-video only — no image-to-video or video-to-video modes
  • Does not perform well with animated or stylized content — optimized for photorealism
  • Extreme motion scenarios may produce minor warping and distortions
  • Requires 4x H100 GPUs (~60GB VRAM) for self-deployment, though community optimizations reduce this to ~20GB

Example Prompts

Nature / Macro

A slow-motion close-up of a hummingbird hovering near a bright red flower, wings beating rapidly. Shallow depth of field, morning dew on the petals, soft bokeh background. Natural daylight, macro lens, 85mm f/1.4.

Cinematic / Urban

A woman with curly hair walks through a rainy city street at twilight, holding a transparent umbrella. Neon signs reflect in puddles on the asphalt. Handheld camera, cinematic color grading, moody atmosphere.

Landscape / Nature

An overhead shot of ocean waves crashing against dark volcanic rocks. White foam swirls around the stone. Slow, hypnotic rhythm. Drone camera, wide angle, cool blue color palette.

Based on the official prompt guide →

FAQexpand_more

Where can I use Mochi 1?

Via API on FAL.ai and Replicate.

How do I get good results with Mochi 1?

Anchor the scene by specifying subject, setting, time of day, and camera — Mochi excels when all four elements are explicit. See the prompt guide below.