Command Palette
Search for a command to run...
Anshul Shah Shlok Mishra Ankan Bansal Jun-Cheng Chen Rama Chellappa Abhinav Shrivastava

Abstract
Recent progress on action recognition has mainly focused on RGB and optical flow features. In this paper, we approach the problem of joint-based action recognition. Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition. We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We also propose a novel joint-contrastive loss that pulls together groups of joint features which convey the same action. We strengthen the joint-based representations by using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics of the action. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets. A late fusion with RGB and Flow-based approaches yields additional improvements. Our model also outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-classification-on-charades | JMRN (Pose only) | MAP: 16.2 |
| action-classification-on-charades | JMRN + R101-NL-LFB | MAP: 43.23 |
| action-recognition-in-videos-on-ava-v21 | JMRN + SlowFast-R101-NL | mAP (Val): 28.4 |
| action-recognition-in-videos-on-hmdb-51 | JRMN | Average accuracy of 3 splits: 54.2 |
| action-recognition-in-videos-on-hmdb-51 | Ours + ResNext101 BERT | Average accuracy of 3 splits: 84.53 |
| action-recognition-on-mimetics | SIP-Net | mAP: 38.3 |
| action-recognition-on-mimetics | JMRN | mAP: 40 |
| skeleton-based-action-recognition-on-jhmdb-2d | JMRN (No GT pose) | Average accuracy of 3 splits: 68.55 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.