5 months ago

Large-scale weakly-supervised pre-training for video action recognition

Ghadiyaram Deepti ; Feiszli Matt ; Tran Du ; Yan Xueting ; Wang Heng ; Mahajan Dhruv

Abstract

Current fully-supervised video datasets consist of only a few hundredthousand videos and fewer than a thousand domain-specific labels. This hindersthe progress towards advanced video architectures. This paper presents anin-depth study of using large volumes of web videos for pre-training videomodels for the task of action recognition. Our primary empirical finding isthat pre-training at a very large scale (over 65 million videos), despite onnoisy social-media videos and hashtags, substantially improves thestate-of-the-art on three challenging public action recognition datasets.Further, we examine three questions in the construction of weakly-supervisedvideo action datasets. First, given that actions involve interactions withobjects, how should one construct a verb-object pre-training label space tobenefit transfer learning the most? Second, frame-based models perform quitewell on action recognition; is pre-training for good image features sufficientor is pre-training for spatio-temporal features valuable for optimal transferlearning? Finally, actions are generally less well-localized in long videos vs.short videos; since action labels are provided at a video level, how should onechoose video clips for best performance, given some fixed budget of number orminutes of videos?

Code Repositories

open-mmlab/mmaction2

pytorch

microsoft/computervision-recipes

pytorch

moabitcoin/ig65m-pytorch

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
action-classification-on-kinetics-400	irCSN-152 (IG-Kinetics-65M pretrain)	Acc@1: 82.8
egocentric-activity-recognition-on-epic-1	R(2+1)D-34 (kinetics)	Actions Top-1 (S2): 16.8
egocentric-activity-recognition-on-epic-1	R(2+1)D-152-SE (ig)	Actions Top-1 (S2): 25.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Large-scale weakly-supervised pre-training for video action recognition

Ghadiyaram Deepti ; Feiszli Matt ; Tran Du ; Yan Xueting ; Wang Heng ; Mahajan Dhruv

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters