Command Palette
Search for a command to run...
Lluis Castrejon; Nicolas Ballas; Aaron Courville

Abstract
Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-generation-on-bair-robot-pushing | VRNN 1L | Cond: 2 FVD score: 149.22 LPIPS: 0.058±0.03 Pred: 28 SSIM: 0.829±0.06 Train: 10 |
| video-generation-on-bair-robot-pushing | Hier-VRNN | Cond: 2 FVD score: 143.4 LPIPS: 0.055±0.03 Pred: 28 SSIM: 0.822±0.06 Train: 10 |
| video-prediction-on-cityscapes-128x128 | Hier-VRNN | Cond.: 2 FVD: 567.51 LPIPS: 0.264 ± .07 Pred: 28 SSIM: 0.628±0.1 Train: 10 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.