HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

M. Esat Kalfaoglu Sinan Kalkan A. Aydin Alatan

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Abstract

In this work, we combine 3D convolution with late temporal modeling for action recognition. For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at the end of 3D convolutional architecture with the Bidirectional Encoder Representations from Transformers (BERT) layer in order to better utilize the temporal information with BERT's attention mechanism. We show that this replacement improves the performances of many popular 3D convolution architectures for action recognition, including ResNeXt, I3D, SlowFast and R(2+1)D. Moreover, we provide the-state-of-the-art results on both HMDB51 and UCF101 datasets with 85.10% and 98.69% top-1 accuracy, respectively. The code is publicly available.

Code Repositories

artest08/LateTemporalModeling3DCNN
Official
pytorch
Mentioned in GitHub
kietngt00/hmdb51-recognition
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-in-videos-on-hmdb-51R2+1D-BERT
Average accuracy of 3 splits: 85.10
action-recognition-on-ucf-101R2+1D-BERT
3-fold Accuracy: 98.69

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition | Papers | HyperAI