Command Palette
Search for a command to run...
Jean-Yves Franceschi Edouard Delasalles Mickaël Chen Sylvain Lamprier Patrick Gallinari

Abstract
Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-generation-on-bair-robot-pushing | SRVP | Cond: 2 FVD score: 162 ± 4 LPIPS: 0.0574±0.0032 PSNR: 19.59±0.27 Pred: 28 SSIM: 0.8196±0.0084 Train: 12 |
| video-prediction-on-cityscapes-128x128 | SRVP | Cond.: 10 LPIPS: 0.447±0.014 PSNR: 20.97±0.43 Pred: 20 SSIM: 0.603±0.016 |
| video-prediction-on-kth | SRVP | Cond: 10 FVD: 222 ± 3 LPIPS: 0.0736±0.0029 PSNR: 29.69±032 Pred: 30 SSIM: 0.8697±0.0046 Train: 10 |
| video-prediction-on-kth-64x64-cond10-pred30 | SRVP | FVD: 222 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.