Command Palette
Search for a command to run...
Clark Aidan ; Donahue Jeff ; Simonyan Karen

Abstract
Generative models of natural images have progressed towards high fidelitysamples by the strong leveraging of scale. We attempt to carry this success tothe field of video modeling by showing that large Generative AdversarialNetworks trained on the complex Kinetics-600 dataset are able to produce videosamples of substantially higher complexity and fidelity than previous work. Ourproposed model, Dual Video Discriminator GAN (DVD-GAN), scales to longer andhigher resolution videos by leveraging a computationally efficientdecomposition of its discriminator. We evaluate on the related tasks of videosynthesis and video prediction, and achieve new state-of-the-art Fr\'echetInception Distance for prediction for Kinetics-600, as well as state-of-the-artInception Score for synthesis on the UCF-101 dataset, alongside establishing astrong baseline for synthesis on Kinetics-600.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-generation-on-bair-robot-pushing | DVD-GAN-FP | Cond: 1 FVD score: 109.8 Pred: 15 Train: 15 |
| video-generation-on-kinetics-600-12-frames | DVD-GAN | FVD: 31.1 |
| video-generation-on-kinetics-600-12-frames-1 | DVD-GAN | FID: 2.16 |
| video-generation-on-kinetics-600-48-frames | DVD-GAN | FID: 12.92 Inception Score: 219.05 |
| video-prediction-on-bair-robot-pushing-1 | DVD-GAN-FP | FVD: 109.8 |
| video-prediction-on-kinetics-600-12-frames | DVD-GAN-FP | Cond: 5 FVD: 69.15±0.78 Pred: 11 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.