HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam Ngan Ho Xitong Yang Tushar Nagarajan Lorenzo Torresani Gedas Bertasius

Video ReCap: Recursive Captioning of Hour-Long Videos

Abstract

Most video captioning models are designed to process short video clips of few seconds and output text describing low-level visual concepts (e.g., objects, scenes, atomic actions). However, most real-world videos last for minutes or hours and have a complex hierarchical structure spanning different temporal granularities. We propose Video ReCap, a recursive video captioning model that can process video inputs of dramatically different lengths (from 1 second to 2 hours) and output video captions at multiple hierarchy levels. The recursive video-language architecture exploits the synergy between different video hierarchies and can process hour-long videos efficiently. We utilize a curriculum learning training scheme to learn the hierarchical structure of videos, starting from clip-level captions describing atomic actions, then focusing on segment-level descriptions, and concluding with generating summaries for hour-long videos. Furthermore, we introduce Ego4D-HCap dataset by augmenting Ego4D with 8,267 manually collected long-range video summaries. Our recursive model can flexibly generate captions at different hierarchy levels while also being useful for other complex video understanding tasks, such as VideoQA on EgoSchema. Data, code, and models are available at: https://sites.google.com/view/vidrecap

Code Repositories

tanveer81/rgnet
pytorch
Mentioned in GitHub
md-mohaiminul/VideoRecap
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-video-question-answer-on-egoschema-1Video ReCap
Accuracy: 50.23

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Video ReCap: Recursive Captioning of Hour-Long Videos | Papers | HyperAI