HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning Video Representations from Large Language Models

Zhao Yue ; Misra Ishan ; Krähenbühl Philipp ; Girdhar Rohit

Learning Video Representations from Large Language Models

Abstract

We introduce LaViLa, a new approach to learning video-languagerepresentations by leveraging Large Language Models (LLMs). We repurposepre-trained LLMs to be conditioned on visual input, and finetune them to createautomatic video narrators. Our auto-generated narrations offer a number ofadvantages, including dense coverage of long videos, better temporalsynchronization of the visual information and text, and much higher diversityof text. The video-text embedding learned contrastively with these additionalauto-generated narrations outperforms the previous state-of-the-art on multiplefirst-person and third-person video tasks, both in zero-shot and finetunedsetups. Most notably, LaViLa obtains an absolute gain of 10.1% on EGTEAclassification and 5.9% Epic-Kitchens-100 multi-instance retrieval benchmarks.Furthermore, LaViLa trained with only half the narrations from the Ego4Ddataset outperforms baseline models trained on the full set, and shows positivescaling behavior on increasing pre-training data and model size.

Code Repositories

facebookresearch/lavila
Official
pytorch
Mentioned in GitHub
Ziyang412/VideoTree
pytorch
Mentioned in GitHub
ceezh/llovi
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-on-charades-egoLaViLa (Zero-shot, TimeSformer-L)
mAP: 28.9
action-recognition-on-charades-egoLaViLa (Finetuned, TimeSformer-L)
mAP: 36.1
action-recognition-on-epic-kitchens-100LaViLa (TimeSformer-L)
Action@1: 51
Noun@1: 62.9
Verb@1: 72
egocentric-activity-recognition-on-egtea-1LaViLa (Finetuned, TimeSformer-L)
Average Accuracy: 81.75
Mean class accuracy: 76

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp