HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Large-scale weakly-supervised pre-training for video action recognition

Ghadiyaram Deepti ; Feiszli Matt ; Tran Du ; Yan Xueting ; Wang Heng ; Mahajan Dhruv

Large-scale weakly-supervised pre-training for video action recognition

Abstract

Current fully-supervised video datasets consist of only a few hundredthousand videos and fewer than a thousand domain-specific labels. This hindersthe progress towards advanced video architectures. This paper presents anin-depth study of using large volumes of web videos for pre-training videomodels for the task of action recognition. Our primary empirical finding isthat pre-training at a very large scale (over 65 million videos), despite onnoisy social-media videos and hashtags, substantially improves thestate-of-the-art on three challenging public action recognition datasets.Further, we examine three questions in the construction of weakly-supervisedvideo action datasets. First, given that actions involve interactions withobjects, how should one construct a verb-object pre-training label space tobenefit transfer learning the most? Second, frame-based models perform quitewell on action recognition; is pre-training for good image features sufficientor is pre-training for spatio-temporal features valuable for optimal transferlearning? Finally, actions are generally less well-localized in long videos vs.short videos; since action labels are provided at a video level, how should onechoose video clips for best performance, given some fixed budget of number orminutes of videos?

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-kinetics-400irCSN-152 (IG-Kinetics-65M pretrain)
Acc@1: 82.8
egocentric-activity-recognition-on-epic-1R(2+1)D-34 (kinetics)
Actions Top-1 (S2): 16.8
egocentric-activity-recognition-on-epic-1R(2+1)D-152-SE (ig)
Actions Top-1 (S2): 25.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Large-scale weakly-supervised pre-training for video action recognition | Papers | HyperAI