HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors

Wang Lei ; Koniusz Piotr

Self-supervising Action Recognition by Statistical Moment and Subspace
  Descriptors

Abstract

In this paper, we build on a concept of self-supervision by taking RGB framesas input to learn to predict both action concepts and auxiliary descriptorse.g., object descriptors. So-called hallucination streams are trained topredict auxiliary cues, simultaneously fed into classification layers, and thenhallucinated at the testing stage to aid network. We design and hallucinate twodescriptors, one leveraging four popular object detectors applied to trainingvideos, and the other leveraging image- and video-level saliency detectors. Thefirst descriptor encodes the detector- and ImageNet-wise class predictionscores, confidence scores, and spatial locations of bounding boxes and frameindexes to capture the spatio-temporal distribution of features per video.Another descriptor encodes spatio-angular gradient distributions of saliencymaps and intensity patterns. Inspired by the characteristic function of theprobability distribution, we capture four statistical moments on the aboveintermediate descriptors. As numbers of coefficients in the mean, covariance,coskewness and cokurtotsis grow linearly, quadratically, cubically andquartically w.r.t. the dimension of feature vectors, we describe the covariancematrix by its leading n' eigenvectors (so-called subspace) and we captureskewness/kurtosis rather than costly coskewness/cokurtosis. We obtain state ofthe art on five popular datasets such as Charades and EPIC-Kitchens.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-charadesDEEP-HAL with ODF+SDF (I3D)
MAP: 50.16
action-classification-on-charadesDEEP-HAL with ODF+SDF (AssembleNet++)
MAP: 62.29
action-recognition-in-videos-on-hmdb-51DEEP-HAL with ODF+SDF(I3D)
Average accuracy of 3 splits: 87.56
egocentric-activity-recognition-on-epic-1DEEP-HAL with ODF+SDF (AssembleNet++)
Actions Top-1 (S1): 35.8
Actions Top-1 (S2): 27.3
scene-recognition-on-yupDEEP-HAL with ODF+SDF (I3D)
Accuracy (%): 94.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors | Papers | HyperAI