HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Jaehoon Yoo Semin Kim Doyup Lee Chiheon Kim Seunghoon Hong

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Abstract

Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive Transformers for generating moderately long videos in both quality and speed. Videos and code are available at https://sites.google.com/view/mebt-cvpr2023 .

Code Repositories

Ugness/MeBT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-generation-on-ucf-101MeBT (128x128, unconditional)
FVD128: 968
FVD16: 438
Inception Score: 65.93

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers | Papers | HyperAI