Text To Video Generation On Ucf 101
评估指标
FVD16
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| MagicVideo (Zero-shot, 256x256) | 699 | MagicVideo: Efficient Video Generation With Latent Diffusion Models | - |
| Video LDM (Zero-shot, 320x512) | 550.61 | Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | |
| LAVIE (Zero-shot, 320x512) | 526.30 | LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models | |
| PYoCo (Zero-shot, 64x64) | 355.19 | Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | - |
| VideoPoet | 355 | VideoPoet: A Large Language Model for Zero-Shot Video Generation | - |
| Lumiere (Zero-shot, 1024x1024) | 332.49 | Lumiere: A Space-Time Diffusion Model for Video Generation | |
| Snap Video (Zero-shot, 288×288) | 260.1 | Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | - |
| W.A.L.T 3B | 258.1 | Photorealistic Video Generation with Diffusion Models | - |
| PixelDance (Zero-shot, 256x256) | 242.82 | Make Pixels Dance: High-Dynamic Video Generation | - |
| Snap Video (Zero-shot, 512x288) | 200.2 | Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | - |
0 of 10 row(s) selected.