Wan AI 2.1

What is Wan AI?

Wan AI is an advanced and powerful visual generation model developed by Tongyi Lab of Alibaba Group. It can generate videos based on text, images, and other control signals. The Wan2.1 series models are now fully open-source.

Overview of Wan AI

๐Ÿ‘

SOTA Performance of Wan AI

Wan AI 2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.

๐Ÿš€

Supports Consumer-grade GPUs

The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models.

๐ŸŽ‰

Multiple tasks of Wan AI

Wan AI 2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation.

๐Ÿ”ฎ

Visual Text Generation of Wan AI

Wan AI 2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications.

๐Ÿ’ช

Powerful Video VAE of Wan AI

Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.

Features of Wan AI

Complex Motions of Wan AI

Excels at generating realistic videos featuring extensive body movements, complex rotations, dynamic scene transitions, and fluid camera motions.

Physical Simulation of Wan AI

Generates videos that accurately simulate real-world physics and realistic object interactions.

Cinematic Quality

Offers movie-like visuals with rich textures and a variety of stylized effects.

Controllable Editing of Wan AI

Features a universal editing model for precise edits using image or video references.

Visual Text Generation of Wan AI

Creates text and dynamic text effects in videos directly from text prompts.

โธ

8-Bit Racing of Wan AI

Prompt: A retro 8-bit style animation of a car race intro. Pixelated muscle cars, each with distinct colors and designs, line up at a starting line in a vast, pixelated desert landscape. Large, pixelated text "WANX RACING" flashes above the cars in vibrant neon colors, reminiscent of classic arcade game titles. The camera pans across the scene, highlighting the retro aesthetic and text. The background features a simple, pixelated desert landscape with a blocky sunset casting warm, golden hues over the scene. The entire environment is bathed in vibrant, pixelated neon colors, enhancing the nostalgic feel.

โธ

Merry Christmas of Wan AI

Prompt: Realist, beautifully decorated Christmas party scene, Christmas trees adorne d with colorful lights and gifts, flames dancing in the fireplace, gingerbread people wearing Christmas hats dancing around the tree, and tables filled with grilled turkey and other delicacies. Exquisite text effects pop up on the screen: "Merry Christmas!" The screen is exquisite, sophisticated, and concise.

โธ

Mad Ricing

Prompt: A retro 70s-style title sequence for a fictional action movie. Hand-drawn, stylized text "WANX" appears dynamically on screen, overlaid on fast-paced clips of car chases, explosions, and daring stunts. The text is bold, gritty, and slightly distorted, reflecting the 70s action movie aesthetic. A montage of high-octane scenes with a retro film grain effect, featuring warm, vintage colors. The sequences are bathed in golden hour light, enhancing the nostalgic feel.

Sound Effects & Music of Wan AI

Generates sound effects and background music that perfectly align with visual content and rhythm.

โธ

Ferrets Entering Water of Wan AI

Prompt: The camera moves rapidly from far to near, with a low angle of view, standing on a log. In the distant view, a white ferret suddenly appears, playing with the log and jumping into the water, then swimming out of the water and sticking its head out. At this moment, the camera zooms in to show a close-up of the white ferret. Several berry trees next to it are splashed with water, moss and snow cover the ground, and the water surface is covered by green fallen leaves. The background is white birch.

โธ

Concert

Prompt: A group of people is performing a symphony in the Vienna Hall.

โธ

Ice Falling of Wan AI

Prompt: A group of people is performing a symphony in the Vienna Hall.

Product Features

Through our product, you can seamlessly leverage our models with a user-friendly experience to access inspiring video content.

Wan AI 2.1 Open Source

In this repo, we release the code and weights for the Wan AI 2.1, a comprehensive and open suite of video foundation models designed to push the boundaries of video generation.

Wan AI 2.1-I2V-14B

The I2V-14B model outperforms leading closed-source models as well as all existing open-source models, achieving SOTA performance. It is capable of generating videos that demonstrate complex visual scenes and motion patterns based on input text and images, including both 480P and 720P resolution models.

Wan AI 2.1-T2V-14B

๐Ÿ˜Š480-720P

The T2V-14B model sets a new SOTA performance among both open-source and closed-source models, showcasing its ability to generate high-quality visuals with substantial motion dynamics. It is also the only video model capable of producing both Chinese and English text and supports video generation at both 480P and 720P resolutions.

Wan AI 2.1-T2V-1.3B

๐Ÿ˜Š480P

The T2V-1.3B model supports video generation on almost all consumer-grade GPUs, requiring only 8.19 GB of BRAM to produce a 5-second 480P video, with an output time of just 4 minutes on an RTX 4090 GPU. Through pre-training and distillation processes, it surpasses larger open-source models and achieves performance even comparable to some advanced closed-source models.

Tech Report

Stay tuned for the upcoming release of our comprehensive technical report for more details.

Built upon the mainstream diffusion transformer paradigm, Wan AI 2.1 achieves significant advancements in generative capabilities through a series of innovations, including our novel spatio-temporal variational autoencoder (VAE), scalable pre-training strategies, large-scale data construction, and automated evaluation metrics. These contributions collectively enhance the model's performance and versatility.