feature deep dive8 min read

Open Source AI Video Models: Self-Host vs API (2026)

HappyHorse 1.0 is #1 on the Arena and open source. Wan 2.7 is Apache 2.0. FramePack runs on 6GB VRAM. Complete guide to self-hosting vs API pricing.

By VidScore Team|Updated April 11, 2026

The #1 AI video model on the Artificial Analysis Arena is open source. HappyHorse 1.0 debuted at ELO 1,347— 124 points ahead of the next model — with an Apache-like commercial license and 15B parameters. Of the 27 models we track, 8 are open source, and the cheapest costs just $0.02/sec via API. The quality gap between open and closed-source AI video has effectively closed.

We compared every open-source AI video model on license terms, hardware requirements, API pricing, and self-hosting viability. Whether you want the cheapest API option or full control by running models on your own GPUs, this guide covers the complete landscape.

Prices verified: April 11, 2026.

All Open-Source AI Video Models

Model	License	Parameters	VRAM Needed	API Price ($/sec)	Self-Host Viable?
HappyHorse 1.0	Apache-like (commercial)	15B	TBD (weights coming)	No API yet	Yes (when weights release)
Wan 2.7	Apache 2.0	27B (MoE)	80GB+ (H100)	$0.10	Enterprise GPU only
LTX-2 Pro	Apache 2.0	14B	24–48GB	$0.06	Yes (A5000/A6000+)
HunyuanVideo 1.5	Open (Tencent)	8.3B	14GB	$0.02 (WaveSpeed)	Yes (RTX 4090)
FramePack	Open	—	6GB	$0.033	Yes (RTX 3060)
Mochi 1	Apache 2.0	10B	24GB	$0.40/clip	Yes (A5000+)
CogVideoX-5B	Custom (commercial OK)	5B	16GB	$0.20/clip	Yes (RTX 4090)
SkyReels V4	Open-source lineage	TBD	TBD (V4 weights pending)	$0.12	Pending (V4 weights expected)

Model Deep Dives

HappyHorse 1.0 — #1 Overall, Open Source

HappyHorse 1.0 from ATH-AI is the highest-quality AI video model available, period. Its Arena ELO of 1,347 (text-to-video) and 1,406 (image-to-video) surpass every commercial model. The Apache-like license allows commercial use. It features 7-language lip-sync and 1080p output.

The catch: model weights are “coming soon” and there’s no API access yet. You can test it through ATH-AI’s demo interface. Once weights are released, this will become the most important self-hosting target in AI video.

Wan 2.7 — Most Capable Open-Source Architecture

Wan 2.7 from Alibaba is the most fully-featured open-source model with API access. At 27B parameters using a Mixture-of-Experts (MoE) architecture, it supports four generation modes: text-to-video, image-to-video, video-to-video, and audio-driven generation. Apache 2.0 license. API pricing starts at $0.10/sec on FAL.ai. Full model weights are expected mid-Q2 2026.

LTX-2 Pro — Best Balance of Quality, Price, and Features

LTX-2 Pro is the open-source model we recommend most often. At $0.06/secwith audio included, it’s the cheapest 1080p+audio option on the market. Apache 2.0 license, 14B parameters, native 4K support ($0.24/sec at 4K), and lip-sync capability. Its 50fps output is smooth enough for slow-motion content.

HunyuanVideo 1.5 — Cheapest API Option

HunyuanVideo 1.5 from Tencent is the absolute cheapest AI video model via API at $0.02/sec on WaveSpeed. At 8.3B parameters and 14GB VRAM, it’s also one of the most accessible for self-hosting. The trade-off is 480p max resolution and no audio. Best for high-volume generation where quality is secondary to cost.

FramePack — Best for Consumer Hardware

FramePack is the standout for self-hosting on consumer GPUs. It runs on just 6GB VRAM (an RTX 3060 at ~$300) and can generate videos up to 120 seconds— far longer than any other model. Output is 640×640 with no audio. The API price on FAL.ai is $0.033/sec, but self-hosting on a consumer GPU eliminates per-second costs entirely.

Mochi 1 — Apache 2.0 Pioneer

Mochi 1 from Genmo was one of the first high-quality open-source video models. At 10B parameters with an Apache 2.0 license, it produces 480p output. API pricing is $0.40 per clip (flat rate). Mochi 1 laid the groundwork for the open-source video generation ecosystem, though newer models like LTX-2 Pro and Wan 2.7 have surpassed it on quality.

CogVideoX-5B — Lightweight Option

CogVideoX-5B is the smallest model in this comparison at 5B parameters. Its custom license permits commercial use. At 16GB VRAM, it runs on an RTX 4090 or equivalent. Output is 480p with API pricing at $0.20 per clip. Best suited as a starting point for teams exploring self-hosted video generation with limited GPU resources.

SkyReels V4 — Open-Source Lineage, Weights Pending

SkyReels V4from Skywork AI builds on open-source foundations and offers API access at $0.12/sec with audio included. It’s the first unified multi-modal video foundation model. V4 weights are pending release — once available, self-hosting will be an option for teams already using the API.

Self-Host Economics: When to Run Your Own GPU

Self-hosting eliminates per-second API costs but introduces fixed infrastructure costs. Here’s when it makes financial sense.

Setup	Hardware Cost	Models Supported	API Equivalent	Breakeven
Consumer: RTX 3060	~$300 (one-time)	FramePack	$0.033/sec (FAL.ai)	~180 clips (5s each)
Consumer: RTX 4090	~$1,600 (one-time)	FramePack, HunyuanVideo 1.5, CogVideoX	$0.02–$0.033/sec	~300–500 clips
Pro: A5000/A6000	~$4,000 (one-time)	All above + LTX-2 Pro, Mochi 1	$0.06/sec (LTX-2 Pro)	~250 clips at LTX-2 Pro rates
Cloud: H100 rental	~$2/hr	All models incl. Wan 2.7	$0.10/sec (Wan 2.7)	~200 clips/month

Breakeven estimates assume 5-second clips with generation overhead. Actual numbers depend on generation speed and GPU utilization.

Decision Framework

Under 200 clips/month: Use API pricing. Zero infrastructure overhead means the per-second cost is absorbed by operational simplicity.
200–500 clips/month:Self-hosting on a consumer GPU (RTX 3060 for FramePack, RTX 4090 for HunyuanVideo) breaks even. One-time hardware cost pays for itself within 1–3 months.
500+ clips/month:Dedicated pro GPU (A5000/A6000) or cloud H100 rental becomes strongly cost-effective. Running LTX-2 Pro on an A6000 saves $0.06/sec on every clip — at 500 clips that’s $150/month in API savings.
Enterprise scale: Cloud H100 rental at ~$2/hr supports Wan 2.7 and every other open-source model. At continuous utilization, this is cheaper than any API pricing for high-quality models.

Open Source vs Commercial: Quality Comparison

Metric	Best Open Source	Best Commercial	Gap
Arena ELO (#1)	HappyHorse 1.0 (1,347)	SkyReels V4 (1,223)	Open source leads
Cheapest API	HunyuanVideo 1.5 ($0.02/sec)	Pika 2.0 ($0.04/sec)	Open source 2x cheaper
Cheapest 1080p + audio	LTX-2 Pro ($0.06/sec)	Hailuo 02 Pro ($0.08, no audio)	Open source wins on features
4K native	LTX-2 Pro ($0.24/sec)	Kling v3 ($0.112/sec)	Commercial 2x cheaper at 4K
Best lip-sync	HappyHorse 1.0 (7 lang, no API)	Veo 3.1 ($0.60/sec)	Comparable quality
Self-host (consumer GPU)	FramePack (6GB VRAM)	N/A	Open source exclusive

Key Insight: The Quality Gap Has Closed

The narrative that commercial AI video models are fundamentally better than open-source alternatives is no longer true. The #1 model on the Arena is open source. Every model priced under $0.05/sec is either open source or uses open-source weights. And FramePack proves you can generate AI video on a $300 consumer GPU.

What commercial models still offer is convenience and features: Kling v3’s multi-shot generation, Veo 3.1’s premium lip-sync, and Runway Gen-4.5’s cinematic aesthetics are not yet matched by open-source alternatives. But for raw quality-per-dollar, open source leads.

For complete API pricing across all models, see our AI Video Pricing Guide, or compare providers with our API Provider Guide.

FAQ

What is the best open-source AI video model in 2026?

HappyHorse 1.0 from ATH-AI holds the #1 position on the Artificial Analysis Video Arena with an ELO of 1,347, beating every commercial model. It has 15B parameters, Apache-like commercial license, and 7-language lip-sync. Model weights are expected soon but no API is available yet.

Can I self-host AI video models on consumer GPUs?

Yes. FramePack runs on as little as 6GB VRAM (RTX 3060, ~$300 one-time cost) and can generate videos up to 120 seconds. HunyuanVideo 1.5 requires 14GB VRAM and produces 480p output. Larger models like Wan 2.7 (27B parameters) require an H100 or equivalent enterprise GPU.

Is it cheaper to self-host or use an API for AI video?

At approximately 200-500 clips per month, self-hosting becomes cheaper than API pricing. FramePack on an RTX 3060 costs ~$300 one-time vs $0.033/sec on FAL.ai. HunyuanVideo 1.5 on a rented H100 costs ~$2/hr vs $0.02/sec on WaveSpeed. Below 200 clips/month, API pricing is more cost-effective due to zero infrastructure overhead.

Has open-source AI video caught up with commercial models?

Yes. The quality gap between open-source and commercial AI video models has effectively closed. The #1 model on the Artificial Analysis Arena (HappyHorse 1.0) is open source. Wan 2.7 (Apache 2.0) and LTX-2 Pro (Apache 2.0) compete directly with commercial offerings on quality while being significantly cheaper via API or self-hosting.

Sources

HappyHorse 1.0 Official Site — #1 Arena model, Apache-like license, weights coming soon
Wan 2.7 on FAL.ai — Apache 2.0, 27B MoE, API pricing
LTX-2 Pro on FAL.ai — Apache 2.0, 14B params, 4K output
HunyuanVideo on GitHub — Open-source weights, 8.3B params
FramePack on GitHub — 6GB VRAM, up to 120s video generation
Mochi 1 on GitHub — Apache 2.0, 10B params from Genmo
Artificial Analysis Video Arena — ELO rankings showing open-source #1
WaveSpeed AI — Cheapest API hosting for open-source models