Command Palette
Search for a command to run...
Remi Denton; Rob Fergus

Abstract
Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-generation-on-bair-robot-pushing | SVG-FP (from FVD) | Cond: 2 FVD score: 315.5 Pred: 14 Train: 14 |
| video-generation-on-bair-robot-pushing | SVG-LP (from vRNN) | Cond: 2 FVD score: 256.62 LPIPS: 0.061±0.03 Pred: 28 SSIM: 0.816±0.07 Train: 10 |
| video-generation-on-bair-robot-pushing | SVG (from SRVP) | Cond: 2 FVD score: 255±4 LPIPS: 0.0609±0.0034 PSNR: 18.95±0.26 Pred: 28 SSIM: 0.8058±0.0088 Train: 12 |
| video-prediction-on-cityscapes-128x128 | SVG (from Hier-VRNN) | Cond.: 2 FVD: 1300.26 LPIPS: 0.549 ± 0.06 Pred: 28 SSIM: 0.574±0.08 Train: 10 |
| video-prediction-on-kth | SVG-LP (from Grid-keypoints) | Cond: 10 FVD: 157.9 LPIPS: 0.129 PSNR: 23.91 Params (M): 22.8 Pred: 40 SSIM: 0.800 Train: 10 |
| video-prediction-on-kth | SVG-LP (from SRVP) | Cond: 10 FVD: 377 ± 6 LPIPS: 0.0923±0.0038 PSNR: 28.06±0.29 Pred: 30 SSIM: 0.8438±0.0054 Train: 10 |
| video-prediction-on-synpickvp | SVG-LP | LPIPS: 0.066 MSE: 51.82 PSNR: 27..38 SSIM: 0.886 |
| video-prediction-on-synpickvp | SVG-Det | LPIPS: 0.068 MSE: 60.60 PSNR: 26.92 SSIM: 0.879 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.