Date

6 months ago

Size

1001.26 MB

1. Tutorial Introduction

Wan-2.2 is an advanced AI video generation model open-sourced by Alibaba's Tongyi Wanxiang Lab on July 28, 2025. It comprises three open-source models: text-based video (Wan2.2-T2V-A14B), image-based video (Wan2.2-I2V-A14B), and unified video generation (Wan2.2-IT2V-5B), with a total of 27 billion parameters. The model is the first to introduce a hybrid expert (MoE) architecture, effectively improving generation quality and computational efficiency. It also pioneers a cinematic-level aesthetic control system, precisely controlling aesthetic effects such as lighting, color, and composition. The tutorial uses the 5B parameter compact video generation model, which supports text and image-based video generation, can run on consumer-grade graphics cards, and is based on a high-efficiency 3D VAE architecture, achieving high compression rates and rapid high-definition video generation capabilities. Related research papers are available. Wan: Open and Advanced Large-Scale Video Generative Models .

This tutorial uses a single RTX A6000 GPU as the computing resource and deploys the Wan2.2-IT2V-5B model. Two examples, Text-to-Video Generation and Image-to-Video Generation, are provided for testing.

2. Effect display

Text generation video

Image-generated video

3. Operation steps

1. Start the container

2. Usage steps

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.

1. Text-to-Video Generation

Specific parameters:

Prompt: The text describing the video content you want to generate.
Duration: Specify the desired video duration (in seconds).
Output Resolution: Select the resolution (width x height) of the generated video.
Sampling Steps: Controls the number of iterative optimizations during video generation (the number of denoising steps for the diffusion model).
Guidance Scale: Controls how well the model follows the user's prompt words.
Sample Shift: Related to the sampler used, used to adjust the sampling process parameters.
Seed: Controls the randomness of the generation process.

2. Image-to-Video Generation

4. Discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

The citation information for this project is as follows:

@article{wan2025,
      title={Wan: Open and Advanced Large-Scale Video Generative Models}, 
      author={Team Wan and Ang Wang and Baole Ai and Bin Wen and Chaojie Mao and Chen-Wei Xie and Di Chen and Feiwu Yu and Haiming Zhao and Jianxiao Yang and Jianyuan Zeng and Jiayu Wang and Jingfeng Zhang and Jingren Zhou and Jinkai Wang and Jixuan Chen and Kai Zhu and Kang Zhao and Keyu Yan and Lianghua Huang and Mengyang Feng and Ningyi Zhang and Pandeng Li and Pingyu Wu and Ruihang Chu and Ruili Feng and Shiwei Zhang and Siyang Sun and Tao Fang and Tianxing Wang and Tianyi Gui and Tingyu Weng and Tong Shen and Wei Lin and Wei Wang and Wei Wang and Wenmeng Zhou and Wente Wang and Wenting Shen and Wenyuan Yu and Xianzhong Shi and Xiaoming Huang and Xin Xu and Yan Kou and Yangyu Lv and Yifei Li and Yijing Liu and Yiming Wang and Yingya Zhang and Yitong Huang and Yong Li and You Wu and Yu Liu and Yulin Pan and Yun Zheng and Yuntao Hong and Yupeng Shi and Yutong Feng and Zeyinzi Jiang and Zhen Han and Zhi-Fan Wu and Ziyu Liu},
      journal = {arXiv preprint arXiv:2503.20314},
      year={2025}
}

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Notebooks

MonkeyOCR: Document Parsing Based on the structure-recognition-relation Triple Paradigm

3 months ago

PaddleOCR-VL: Multimodal Document Parsing

3 months ago

SAM3: Visual Segmentation Model

2 months ago

Open-AutoGLM: Smart Assistant for Mobile Devices

2 months ago

HunyuanOCR: Tencent Hunyuan End-to-End OCR

2 months ago

F5-E2 TTS Clones Any Sound in Just 3 Seconds

2 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

6 months ago

Size

1001.26 MB

1. Tutorial Introduction

This tutorial uses a single RTX A6000 GPU as the computing resource and deploys the Wan2.2-IT2V-5B model. Two examples, Text-to-Video Generation and Image-to-Video Generation, are provided for testing.