8 months ago

Action Recognition

Convolutional Neural Network

Method/Architecture

Computer Vision

Akash Singh Tom De Schepper Kevin Mets Peter Hellinckx José Oramas Steven Latré

Abstract

In recent years multi-label, multi-class video action recognition has gainedsignificant popularity. While reasoning over temporally connected atomicactions is mundane for intelligent species, standard artificial neural networks(ANN) still struggle to classify them. In the real world, atomic actions oftentemporally connect to form more complex composite actions. The challenge liesin recognising composite action of varying durations while other distinctcomposite or atomic actions occur in the background. Drawing upon the successof relational networks, we propose methods that learn to reason over thesemantic concept of objects and actions. We empirically show how ANNs benefitfrom pretraining, relational inductive biases and unordered set-based latentrepresentations. In this paper we propose deep set conditioned I3D (SCI3D), atwo stream relational network that employs latent representation of state andvisual representation for reasoning over events and actions. They learn toreason about temporally connected actions in order to identify all of them inthe video. The proposed method achieves an improvement of around 1.49% mAP inatomic action recognition and 17.57% mAP in composite action recognition, overa I3D-NL baseline, on the CATER dataset.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Action Recognition

Convolutional Neural Network

Method/Architecture

Computer Vision

Akash Singh Tom De Schepper Kevin Mets Peter Hellinckx José Oramas Steven Latré

Abstract

In recent years multi-label, multi-class video action recognition has gainedsignificant popularity. While reasoning over temporally connected atomicactions is mundane for intelligent species, standard artificial neural networks(ANN) still struggle to classify them. In the real world, atomic actions oftentemporally connect to form more complex composite actions. The challenge liesin recognising composite action of varying durations while other distinctcomposite or atomic actions occur in the background. Drawing upon the successof relational networks, we propose methods that learn to reason over thesemantic concept of objects and actions. We empirically show how ANNs benefitfrom pretraining, relational inductive biases and unordered set-based latentrepresentations. In this paper we propose deep set conditioned I3D (SCI3D), atwo stream relational network that employs latent representation of state andvisual representation for reasoning over events and actions. They learn toreason about temporally connected actions in order to identify all of them inthe video. The proposed method achieves an improvement of around 1.49% mAP inatomic action recognition and 17.57% mAP in composite action recognition, overa I3D-NL baseline, on the CATER dataset.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp