Command Palette
Search for a command to run...
Wan2.2: An open-source high-level large-scale Video Generation Model
Date
Size
1001.26 MB
License
Apache 2.0
GitHub
Paper URL
1. Tutorial Introduction

Wan-2.2 is an advanced AI video generation model open-sourced by Alibaba's Tongyi Wanxiang Lab on July 28, 2025. It comprises three open-source models: text-based video (Wan2.2-T2V-A14B), image-based video (Wan2.2-I2V-A14B), and unified video generation (Wan2.2-IT2V-5B), with a total of 27 billion parameters. The model is the first to introduce a hybrid expert (MoE) architecture, effectively improving generation quality and computational efficiency. It also pioneers a cinematic-level aesthetic control system, precisely controlling aesthetic effects such as lighting, color, and composition. The tutorial uses the 5B parameter compact video generation model, which supports text and image-based video generation, can run on consumer-grade graphics cards, and is based on a high-efficiency 3D VAE architecture, achieving high compression rates and rapid high-definition video generation capabilities. Related research papers are available. Wan: Open and Advanced Large-Scale Video Generative Models .
This tutorial uses a single RTX A6000 GPU as the computing resource and deploys the Wan2.2-IT2V-5B model. Two examples, Text-to-Video Generation and Image-to-Video Generation, are provided for testing.
2. Effect display
Text generation video

Image-generated video

3. Operation steps
1. Start the container

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.
1. Text-to-Video Generation
Specific parameters:
- Prompt: The text describing the video content you want to generate.
- Duration: Specify the desired video duration (in seconds).
- Output Resolution: Select the resolution (width x height) of the generated video.
- Sampling Steps: Controls the number of iterative optimizations during video generation (the number of denoising steps for the diffusion model).
- Guidance Scale: Controls how well the model follows the user's prompt words.
- Sample Shift: Related to the sampler used, used to adjust the sampling process parameters.
- Seed: Controls the randomness of the generation process.

2. Image-to-Video Generation

4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information
The citation information for this project is as follows:
@article{wan2025,
title={Wan: Open and Advanced Large-Scale Video Generative Models},
author={Team Wan and Ang Wang and Baole Ai and Bin Wen and Chaojie Mao and Chen-Wei Xie and Di Chen and Feiwu Yu and Haiming Zhao and Jianxiao Yang and Jianyuan Zeng and Jiayu Wang and Jingfeng Zhang and Jingren Zhou and Jinkai Wang and Jixuan Chen and Kai Zhu and Kang Zhao and Keyu Yan and Lianghua Huang and Mengyang Feng and Ningyi Zhang and Pandeng Li and Pingyu Wu and Ruihang Chu and Ruili Feng and Shiwei Zhang and Siyang Sun and Tao Fang and Tianxing Wang and Tianyi Gui and Tingyu Weng and Tong Shen and Wei Lin and Wei Wang and Wei Wang and Wenmeng Zhou and Wente Wang and Wenting Shen and Wenyuan Yu and Xianzhong Shi and Xiaoming Huang and Xin Xu and Yan Kou and Yangyu Lv and Yifei Li and Yijing Liu and Yiming Wang and Yingya Zhang and Yitong Huang and Yong Li and You Wu and Yu Liu and Yulin Pan and Yun Zheng and Yuntao Hong and Yupeng Shi and Yutong Feng and Zeyinzi Jiang and Zhen Han and Zhi-Fan Wu and Ziyu Liu},
journal = {arXiv preprint arXiv:2503.20314},
year={2025}
}
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.