HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Phenaki: Variable Length Video Generation From Open Domain Textual Description

Ruben Villegas Mohammad Babaeizadeh Pieter-Jan Kindermans Hernan Moraldo Han Zhang Mohammad Taghi Saffar Santiago Castro Julius Kunze Dumitru Erhan

Phenaki: Variable Length Video Generation From Open Domain Textual Description

Abstract

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per-frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
video-prediction-on-bair-robot-pushing-1Phenaki
FVD: 97

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Phenaki: Variable Length Video Generation From Open Domain Textual Description | Papers | HyperAI