HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Du Tran; Heng Wang; Lorenzo Torresani; Jamie Ray; Yann LeCun; Manohar Paluri

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Abstract

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-kinetics-400R[2+1]D-RGB (Sports-1M pretrain)
Acc@1: 74.3
Acc@5: 91.4
action-classification-on-kinetics-400R[2+1]D-RGB
Acc@1: 72
Acc@5: 90
action-classification-on-kinetics-400R[2+1]D-Two-Stream
Acc@1: 73.9
Acc@5: 90.9
action-classification-on-kinetics-400R[2+1]D
Acc@1: 72
Acc@5: 90
action-classification-on-kinetics-400R[2+1]D-Flow
Acc@1: 67.5
Acc@5: 87.2
action-classification-on-kinetics-400R[2+1]D-Flow (Sports-1M pretrain)
Acc@1: 75.4
Acc@5: 91.9
action-recognition-in-videos-on-hmdb-51R[2+1]D-Flow (Kinetics pretrained)
Average accuracy of 3 splits: 76.4
action-recognition-in-videos-on-hmdb-51R[2+1]D-RGB (Sports1M pretrained)
Average accuracy of 3 splits: 66.6
action-recognition-in-videos-on-hmdb-51R[2+1]D-TwoStream (Kinetics pretrained)
Average accuracy of 3 splits: 78.7
action-recognition-in-videos-on-hmdb-51R[2+1]D-RGB (Kinetics pretrained)
Average accuracy of 3 splits: 74.5
action-recognition-in-videos-on-hmdb-51R[2+1D]D-TwoStream (Sports1M pretrained)
Average accuracy of 3 splits: 72.7
action-recognition-in-videos-on-hmdb-51R[2+1]D-Flow (Sports1M pretrained)
Average accuracy of 3 splits: 70.1
action-recognition-in-videos-on-sports-1mR[2+1]D-Two-Stream-32frame
Video hit@1 : 73.3
Video hit@5: 91.9
action-recognition-in-videos-on-sports-1mR[2+1]D-RGB-32frame
Clip Hit@1: 57
Video hit@1 : 73
Video hit@5: 91.5
action-recognition-in-videos-on-sports-1mR[2+1]D-Flow-32frame
Clip Hit@1: 46.4
Video hit@1 : 68.4
Video hit@5: 88.7
action-recognition-in-videos-on-ucf101R[2+1]D-Flow (Sports-1M pretrained)
3-fold Accuracy: 93.3
action-recognition-in-videos-on-ucf101R[2+1]D-RGB (Sports-1M pretrained)
3-fold Accuracy: 93.6
action-recognition-in-videos-on-ucf101R[2+1]D-Flow (Kinetics pretrained)
3-fold Accuracy: 95.5
action-recognition-in-videos-on-ucf101R[2+1]D-TwoStream (Kinetics pretrained)
3-fold Accuracy: 97.3
action-recognition-in-videos-on-ucf101R[2+1]D-RGB (Kinetics pretrained)
3-fold Accuracy: 96.8
action-recognition-in-videos-on-ucf101R[2+1]D-TwoStream (Sports-1M pretrained)
3-fold Accuracy: 95

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Closer Look at Spatiotemporal Convolutions for Action Recognition | Papers | HyperAI