HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan; Andrew Zisserman

Two-Stream Convolutional Networks for Action Recognition in Videos

Abstract

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

Code Repositories

HsinYingLee/OPN
caffe2
Mentioned in GitHub
jerryljq/ActionRecognition
Mentioned in GitHub
mcgridles/LENS
pytorch
Mentioned in GitHub
Michaelgod/test
Mentioned in GitHub
woodfrog/ActionRecognition
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-charades2-Strm
MAP: 18.6
action-recognition-in-videos-on-hmdb-51Two-Stream (ImageNet pretrained)
Average accuracy of 3 splits: 59.4
action-recognition-in-videos-on-ucf101Two-Stream (ImageNet pretrained)
3-fold Accuracy: 88.0
hand-gesture-recognition-on-viva-hand-1Two Stream CNNs
Accuracy: 68

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Two-Stream Convolutional Networks for Action Recognition in Videos | Papers | HyperAI