Command Palette
Search for a command to run...

Abstract
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video into spatial-temporal visual tokens and propose an embedding method for masked video token modeling to facilitate multi-task learning. We conduct extensive experiments to demonstrate the quality, efficiency, and flexibility of MAGVIT. Our experiments show that (i) MAGVIT performs favorably against state-of-the-art approaches and establishes the best-published FVD on three video generation benchmarks, including the challenging Kinetics-600. (ii) MAGVIT outperforms existing methods in inference time by two orders of magnitude against diffusion models and by 60x against autoregressive models. (iii) A single MAGVIT model supports ten diverse generation tasks and generalizes across videos from different visual domains. The source code and trained models will be released to the public at https://magvit.cs.cmu.edu.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| text-to-video-generation-on-something | MAGVIT | FVD: 79.1 |
| video-generation-on-bair-robot-pushing | MAGVIT | Cond: 1 FVD score: 62 Pred: 15 Train: 15 |
| video-generation-on-kinetics-600-12-frames | MAGVIT | FVD: 9.9 |
| video-generation-on-ucf-101 | MAGVIT (AR) | FVD16: 265 |
| video-generation-on-ucf-101 | MAGVIT (-L-CG, 128x128, class-conditional) | FVD16: 76±2 Inception Score: 89.27±0.15 |
| video-generation-on-ucf-101 | MAGVIT (-B-CG, 128x128, class-conditional) | FVD16: 159±2 Inception Score: 83.55±0.14 |
| video-prediction-on-bair-robot-pushing-1 | MAGVIT (-B-FP) | FVD: 76±0.1 |
| video-prediction-on-bair-robot-pushing-1 | MAGVIT (-L-FP) | FVD: 62±0.1 |
| video-prediction-on-kinetics-600-12-frames | MAGVIT (-L-FP) | Cond: 5 FVD: 9.9±0.3 Pred: 11 |
| video-prediction-on-kinetics-600-12-frames | MAGVIT (-B-FP) | Cond: 5 FVD: 24.5±0.9 Pred: 11 |
| video-prediction-on-something-something-v2 | MAGVIT | FVD: 28.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.