HappyHorse 1.0: The AI Video Model That Broke the Arena
Alibaba's stealth entry topped both Arena leaderboards with ELO 1,347. Led by ex-Kling VP Zhang Di. 15B params, open source. Full review and what it means.
HappyHorse 1.0 is a 15-billion-parameter open-source AI video model built by ATH-AI (Alibaba). It ranks #1 on the Artificial Analysis Video Arena in both Text-to-Video (ELO ~1,347) and Image-to-Video (ELO ~1,406) as of April 2026 — the largest gap over #2 in Arena history. Key specs: 1080p output, 5–10 second clips, native audio-video generation in a single pass, lip-sync across 7 languages, and ~38-second inference on a single H100. The team is led by Zhang Di, former Kuaishou VP who architected Kling 1.0 and 2.0. No public API or downloadable weights yet — official API expected around April 30, 2026. Once weights ship under the Apache 2.0 license, self-hosting will be free.
On April 7, 2026, a pseudonymous model appeared on the Artificial Analysis Video Arena and immediately claimed #1 in both Text-to-Video (ELO 1,347) and Image-to-Video (ELO 1,406) — the largest gap over #2 in leaderboard history. Three days later, Bloomberg broke the story: the model was HappyHorse 1.0, built by a team inside Alibaba. It is the biggest story in AI video this year.
HappyHorse 1.0 is a 15-billion-parameter open-source model that generates synchronized audio and video in a single inference pass, with native lip-sync across 7 languages. It produces 1080p output in approximately 38 seconds on a single H100. The team behind it is led by Zhang Di, former Vice President of Kuaishou and the architect of Kling1.0 and 2.0 — which means the person who built the previous generation’s best model just built the current generation’s best model, for a different company.
Last verified: April 13, 2026. ELO scores shift daily.
The Story: Stealth Submission, Arena Domination, Alibaba Reveal
The sequence of events reads like a thriller. On April 7, a model with no public identity was submitted to the Artificial Analysis Video Arena — the industry’s most respected blind-comparison leaderboard, where real users vote between anonymous outputs without knowing which model produced what. Within 48 hours, it sat at #1 in both T2V and I2V categories (no audio), crushing ByteDance’s Seedance 2.0 by 60+ Elo points in T2V and 74 points in I2V.
The AI community erupted. Artificial Analysis themselves used the word “pseudonymous” when announcing the model’s addition. On Chinese social media, the phrase that went viral was: “This horse is absolutely wild!” Speculation ranged from a stealth Google project to a well-funded startup nobody had heard of.
On April 10, Bloomberg reported that HappyHorse was created by a team within Alibaba’s newly formed Alibaba Token Hub (ATH) business group. The official HappyHorse X account confirmed: “We are part of Alibaba-ATH.” Alibaba’s Hong Kong-listed shares rose over 4% intraday on the news.
The man behind it: Zhang Di, who previously served as Vice President of Technology at Kuaishou, where he architected the Kling 1.0 and 2.0 video generation models. A graduate of Shanghai Jiao Tong University, Zhang Di joined Alibaba at the end of 2025 to lead the Taotian Group’s Future Life Laboratory — now part of ATH-AI. As the South China Morning Post reported, HappyHorse’s debut offered “a glimpse into China’s race for AI talent,” with Alibaba poaching the very architect of Kuaishou’s flagship AI product.
Specs & Capabilities
| Spec | HappyHorse 1.0 |
|---|---|
| Developer | ATH-AI (ex-Alibaba Taotian Lab) |
| Architecture | Unified Single-Stream Transformer (40 layers) |
| Parameters | 15 billion |
| Max Resolution | 1080p (480p, 720p also supported) |
| Duration | 5–10 seconds |
| FPS | 24, 30 |
| Aspect Ratios | 16:9, 9:16, 4:3, 21:9, 1:1 |
| Text-to-Video | Yes |
| Image-to-Video | Yes |
| Native Audio | Yes (dialogue, ambient, Foley — single pass) |
| Lip-Sync | Yes — 7 languages (CN, EN, JP, KR, DE, FR, +1) |
| Multi-Shot | Yes |
| Video-to-Video | No |
| Camera Control | No |
| Inference Speed | ~38 seconds for 1080p on a single H100 (8-step denoising) |
| Open Source | Yes (Apache 2.0 + Commercial License) — weights pending |
| API Available | No (expected ~April 30, 2026) |
Arena Performance
The Artificial Analysis Video Arena uses blind user voting — not lab-reported benchmarks — making it the most credible quality signal in AI video. Here is where HappyHorse 1.0 stands as of mid-April 2026:
| Category | HappyHorse 1.0 | #2 Model | Gap |
|---|---|---|---|
| T2V (no audio) | ~1,347 Elo (#1) | Seedance 2.0 (~1,273) | +74 |
| I2V (no audio) | ~1,406 Elo (#1) | Seedance 2.0 (~1,332) | +74 |
| T2V (with audio) | ~1,205 Elo (#2) | Seedance 2.0 (~1,219) | −14 |
| I2V (with audio) | ~#2 | Seedance 2.0 (narrow lead) | ~−1 |
A 60+ Elo gap translates to winning roughly 58–59% of head-to-head blind matchups — a significant margin in a field where models are converging in quality. In the no-audio categories, HappyHorse’s dominance is decisive. In the with-audio categories, Seedance 2.0holds a narrow edge, likely reflecting ByteDance’s strength in multimodal reference control (up to 9 images, 3 video clips, and 3 audio files per generation).
For context on the broader leaderboard, see the full VidScore leaderboard.
What Makes HappyHorse Different
Unified Single-Stream Architecture
Most video models treat audio and video as separate pipelines — generate the video, then generate or overlay audio. HappyHorse uses a unified single-stream Transformer that processes text, image, video, and audio tokens together in one sequence. Every part of the model sees every modality simultaneously. This is why audio feels matched to what is happening on screen rather than approximately synced after the fact. Dialogue, ambient sound, and Foley effects are all generated in the same forward pass as the video frames.
7-Language Native Lip-Sync
HappyHorse generates lip-synced speech across Chinese, English, Japanese, Korean, German, and French in a single pass. This is not a post-processing lip-sync layer — it’s native to the generation process. For multilingual content creators, this eliminates an entire post-production step. Specify language and vocal quality directly in the prompt: [Speaker, warm female voice, Japanese]. For a full comparison of lip-sync capabilities across models, see our AI Video Lip-Sync Guide.
15B Parameters, Open Source
At 15 billion parameters, HappyHorse is significantly larger than most open-source video models. The announced Apache 2.0 + Commercial License means anyone can use, modify, and commercialize the model once weights are released. This matters because the #1 ranked model on the most credible leaderboard will be free to self-host — a first for AI video at this quality level.
Fast Inference
Approximately 38 seconds for 1080p video on a single H100 via 8-step denoising. This is fast enough for rapid iteration, especially compared to models that require minutes per generation. The 8-step denoising approach (vs. the 20–50 steps typical of diffusion models) is a meaningful architectural advantage for production workflows. For speed benchmarks across all major models, see Fastest AI Video Generators.
Strengths
- #1 on Artificial Analysis Arenain both T2V and I2V (no audio) — the largest gap over #2 in leaderboard history, validated by blind user voting, not lab benchmarks.
- Native audio-video synchronization— dialogue, ambient sound, and Foley generated in a single forward pass with the video. No separate audio pipeline or post-sync step.
- 7-language lip-sync (CN, EN, JP, KR, DE, FR) built into the generation process, not layered on afterward. The only model offering native multilingual lip-sync at this quality level.
- Open source with commercial license— once weights drop, this becomes the first #1-ranked model available for free self-hosting. A major shift for teams priced out of API-only models.
- Fast inference— ~38 seconds for 1080p on a single H100 via 8-step denoising. Rapid iteration is possible.
- Strong temporal coherence— community testing confirms stable facial expressions and consistent motion, especially in simple camera setups. Well-suited for short videos, ads, and pre-visualization.
Limitations (Honest Assessment)
- No public API or pricing:As of April 13, 2026, there is no API, no third-party provider support (FAL.ai, WaveSpeed, Replicate — none), and no downloadable weights. The official API launch is expected around April 30. Until then, you cannot integrate HappyHorse into any production workflow.
- Weights not yet released:The GitHub repository says “coming soon.” The Hugging Face model card exists but weights are pending. The “open source” claim is a promise, not a deliverable — yet.
- Shorter duration than competitors: Max 10 seconds vs. 15 seconds for Kling v3, Seedance 2.0, and SkyReels V4, and 20 seconds for Sora 2. For narrative content, this is a real constraint.
- No video-to-video, camera control, or motion brush: You cannot edit existing footage, specify camera paths, or paint motion regions. Kling v3, Seedance 2.0, and Runway Gen-4.5 all offer camera control; Seedance 2.0 also supports video-to-video editing.
- Complex motion produces artifacts: Community testing reports that HappyHorse handles simple motion well with stable cameras but inconsistencies appear as motion complexity increases. Keep clips short and avoid complex camera movement.
- Fake website proliferation:The model’s viral popularity spawned dozens of fake websites impersonating HappyHorse. The official X account warned: “We have not yet launched an official website. The website you have seen is not ours.” Be cautious about any site claiming to offer HappyHorse access.
- Real-world reliability unproven: The model is days old with limited community testing. Arena ELO scores are strong, but they measure quality in controlled comparisons, not reliability at scale.
Pricing & Availability
As of April 13, 2026, there is no way to buy HappyHorse 1.0 access. Here is the current status:
| Channel | Status | Expected |
|---|---|---|
| Official API | Not available | ~April 30, 2026 |
| Model Weights (GitHub) | “Coming soon” | TBD |
| Hugging Face | Model card only, no weights | TBD |
| FAL.ai | No support | Unknown |
| WaveSpeed | No support | Unknown |
| Replicate | No support | Unknown |
| Demo (happyhorse-ai.com) | Limited daily generations | Available now |
| Arena testing | Via Artificial Analysis blind votes | Available now |
Once weights are released, self-hosting on H100 hardware will be free under the Apache 2.0 + Commercial License. The 38-second inference time on a single H100 makes self-hosting practical for teams with GPU access. Third-party API providers will likely follow quickly once weights are public.
For current pricing across models you can actually use today, see the AI Video Pricing Guide 2026.
HappyHorse vs The Competition
How does HappyHorse 1.0 compare to the leading AI video models? This table covers quality, features, and availability as of April 2026. For interactive side-by-side comparisons, use the VidScore comparison tool.
| Feature | HappyHorse 1.0 | Kling v3 | Seedance 2.0 | Veo 3.1 |
|---|---|---|---|---|
| Arena Rank (T2V) | #1 (~1,347) | #4 (~1,243) | #2 (~1,273) | Not ranked |
| Max Resolution | 1080p | 4K | 1080p | 4K |
| Max Duration | 10s | 15s | 15s | 8s |
| Native Audio | Yes | Yes | Yes | Yes |
| Lip-Sync | Yes (7 languages) | No | Yes | Yes |
| Multi-Shot | Yes | Yes (6 shots) | Yes | No |
| Camera Control | No | Yes | Yes | No |
| Video-to-Video | No | No | Yes | No |
| Open Source | Yes (pending) | No | No | No |
| API Price (trusted provider) | N/A | $0.112/sec (FAL.ai) | $0.303/sec (FAL.ai) | $0.40/sec (FAL.ai) |
| Available Now | No | Yes | Limited | Yes |
The pattern is clear: HappyHorse wins on raw generation quality (Arena ELO) and lip-sync breadth, but loses on resolution (1080p vs. 4K), duration (10s vs. 15s), feature depth (no camera control, no video-to-video), and — crucially — availability. Both Seedance 2.0 and Kling v3 offer camera control and longer durations (15s). Kling v3 remains the most practical choice for production work today: it ships 4K, multi-shot, camera control, and native audio at $0.112/sec on FAL.ai (HIGH trust provider) with two live API providers.
For detailed head-to-head breakdowns, see Best AI Video Generators 2026.
What This Means for the Market
Open Source Is Catching Up — Fast
HappyHorse is the first open-source model to hold #1 on the Artificial Analysis Arena. This matters. It means the quality gap between open-source and closed-source video models is closing — or has already closed for generation quality. Once weights ship, any team with H100 access can run the world’s highest-ranked video model for free. That changes the economics of every closed-source API provider.
For more on open-source AI video models, see our Open Source AI Video Guide.
The Alibaba-Kuaishou Rivalry Escalates
Zhang Di built Kling at Kuaishou. Now he’s built something that outranks Kling at Alibaba. Kuaishou’s Kling AI recently hit an annualized revenue run rate of $240 million — and the person who architected the technology is now working for a competitor. The South China Morning Post framed this as a window into “China’s race for AI talent.” Expect Kuaishou to respond aggressively with Kling v4.
Timing Is Everything
HappyHorse’s debut coincides with two market disruptions: OpenAI discontinued its Sora video generation app, and ByteDance paused Seedance 2.0 access following copyright disputes with Hollywood studios. Alibaba’s ATH group was formed on March 16, and HappyHorse topped the Arena on April 8 — a 23-day turnaround from organizational restructuring to global #1. The South China Morning Post interpreted this as Alibaba signaling its AI strategy is shifting from “hero-driven” to “system-driven.”
FAQ
Who made HappyHorse 1.0?
HappyHorse 1.0 was built by ATH-AI, the AI Innovation Division formerly under Alibaba's Taotian Group Future Life Laboratory. The team is led by Zhang Di, former Vice President of Kuaishou and architect of Kling 1.0 and 2.0. Zhang Di joined Alibaba at the end of 2025 to lead multimodal AI innovation.
Is HappyHorse 1.0 open source?
HappyHorse 1.0 is announced as fully open source under an Apache 2.0 + Commercial Usage License, with 15B parameter weights, distilled models, super-resolution modules, and inference code promised on GitHub. However, as of mid-April 2026, the GitHub repository shows "coming soon" — weights have not yet been publicly released.
How much does HappyHorse 1.0 cost?
There is no public API pricing yet. HappyHorse 1.0 has no third-party provider support on FAL.ai, WaveSpeed, or Replicate as of April 13, 2026. The official API launch is expected around April 30, 2026. Once weights are released, self-hosting will be free under the commercial license.
What are HappyHorse 1.0's Arena ELO scores?
On the Artificial Analysis Video Arena, HappyHorse 1.0 holds approximately ELO 1,347 in Text-to-Video (no audio) and ELO 1,406 in Image-to-Video (no audio) — both #1. The I2V score is 74 points ahead of #2 Seedance 2.0, the largest gap in Arena history. In audio-enabled categories, Seedance 2.0 edges ahead by a narrow margin.
Can I use HappyHorse 1.0 right now?
Access is extremely limited. You can test it through the Artificial Analysis Video Arena (blind comparison votes) and a demo at happyhorse-ai.com with daily generation caps. There is no public API, no downloadable weights, and no third-party hosting yet. Beware of the many fake websites that appeared after the model went viral — the official X account has warned that they have not launched a public website.
Is HappyHorse 1.0 better than Seedance 2.0 and Kling v3?
In raw generation quality, yes — HappyHorse 1.0 leads Seedance 2.0 by 74 ELO points in both T2V and I2V (no audio) on the Artificial Analysis Video Arena. However, Seedance 2.0 and Kling v3 both offer camera control, longer durations (15s vs 10s), and are available via API right now. Seedance 2.0 also supports video-to-video editing, and Kling v3 supports 4K resolution and 6-shot multi-shot. HappyHorse wins on quality metrics but lags on features and availability.
Is HappyHorse 1.0 good for production video work?
Not yet. As of April 2026, HappyHorse 1.0 has no public API, no downloadable weights, and limited access through a demo site. It excels in Arena quality benchmarks (ELO 1,347 T2V, 1,406 I2V), but lacks camera control, video-to-video editing, and maxes out at 10 seconds. For production work today, Kling v3 ($0.112/sec on FAL.ai with 4K and camera control) or Seedance 2.0 ($0.022–$0.303/sec with 7 providers) are more practical choices. HappyHorse becomes compelling for production once the API launches and weights are released.
Sources
- Bloomberg: Video AI Model Developed by Alibaba Tops Global Ranking on Debut — Bloomberg exclusive on Alibaba revealing ownership of HappyHorse
- CNBC: Alibaba Revealed as Creator of AI Video Model HappyHorse-1.0 — CNBC coverage of the Arena debut and Alibaba confirmation
- South China Morning Post: Alibaba's HappyHorse Tops Seedance — SCMP analysis of the Alibaba-Kuaishou talent war and Zhang Di's move
- Caixin Global: Alibaba Unveils HappyHorse After AI Model Tops Rankings Under Alias — Caixin on the pseudonymous Arena submission and organizational restructuring
- Artificial Analysis Video Arena — Live ELO rankings for text-to-video and image-to-video models
- GIGAZINE: HappyHorse-1.0 Fake Website Warning — Coverage of the fake website proliferation and official warning from HappyHorse's X account
- HappyHorse-1.0 on Hugging Face — Official model card (weights pending release)