HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Deep set conditioned latent representations for action recognition

Singh Akash ; De Schepper Tom ; Mets Kevin ; Hellinckx Peter ; Oramas Jose ; Latre Steven

Deep set conditioned latent representations for action recognition

Abstract

In recent years multi-label, multi-class video action recognition has gainedsignificant popularity. While reasoning over temporally connected atomicactions is mundane for intelligent species, standard artificial neural networks(ANN) still struggle to classify them. In the real world, atomic actions oftentemporally connect to form more complex composite actions. The challenge liesin recognising composite action of varying durations while other distinctcomposite or atomic actions occur in the background. Drawing upon the successof relational networks, we propose methods that learn to reason over thesemantic concept of objects and actions. We empirically show how ANNs benefitfrom pretraining, relational inductive biases and unordered set-based latentrepresentations. In this paper we propose deep set conditioned I3D (SCI3D), atwo stream relational network that employs latent representation of state andvisual representation for reasoning over events and actions. They learn toreason about temporally connected actions in order to identify all of them inthe video. The proposed method achieves an improvement of around 1.49% mAP inatomic action recognition and 17.57% mAP in composite action recognition, overa I3D-NL baseline, on the CATER dataset.

Benchmarks

BenchmarkMethodologyMetrics
atomic-action-recognition-on-caterFasterRCNN
Average-mAP: 63.85
atomic-action-recognition-on-caterR3D-NL
Average-mAP: 95.28
atomic-action-recognition-on-caterSCI3D
Average-mAP: 96.77
atomic-action-recognition-on-caterSingle stream SCI3D
Average-mAP: 91.82
composite-action-recognition-on-caterSingle stream SCI3D
Average-mAP: 69.76
composite-action-recognition-on-caterSCI3D
Average-mAP: 66.71
composite-action-recognition-on-caterR3D-NL
Average-mAP: 52.19
composite-action-recognition-on-caterFasterRCNN
Average-mAP: 25.45

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep set conditioned latent representations for action recognition | Papers | HyperAI