| SRVP | 10 | 222 ± 3 | 0.0736±0.0029 | 29.69±032 | - | 30 | 0.8697±0.0046 | 10 | Stochastic Latent Residual Video Prediction | |
| SLAMP | 10 | 228 ± 5 | 0.0795±0.0034 | 29.39±0.30 | - | 30 | 0.8646±0.0050 | 10 | SLAMP: Stochastic Latent Appearance and Motion Prediction | |
| SV2P (from SRVP) | 10 | 636 ± 1 | 0.2049±0.0053 | 28.19±0.31 | - | 30 | 0.838 | 10 | Stochastic Variational Video Prediction | |
| SVG-LP (from SRVP) | 10 | 377 ± 6 | 0.0923±0.0038 | 28.06±0.29 | - | 30 | 0.8438±0.0054 | 10 | Stochastic Video Generation with a Learned Prior | |
| SAVP-VAE | 10 | - | - | 27.77 | - | 20 | 0.852 | - | Stochastic Adversarial Video Prediction | |
| Grid-keypoints | 10 | 144.2 | 0.092 | 27.11 | 2.0 | 40 | 0.837 | 10 | Accurate Grid Keypoint Learning for Efficient Video Prediction | |
| SAVP (from SRVP) | 10 | 374 ± 3 | 0.1120±0.0039 | 26.51±0.29 | - | 30 | 0.7564±0.0062 | 10 | Stochastic Adversarial Video Prediction | |