Command Palette
Search for a command to run...
Srijan Das Saurav Sharma Rui Dai Francois Bremond Monique Thonnat

Abstract
In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL). ADL have two specific properties (i) subtle spatio-temporal patterns and (ii) similar visual patterns varying with time. Therefore, ADL may look very similar and often necessitate to look at their fine-grained details to distinguish them. Because the recent spatio-temporal 3D ConvNets are too rigid to capture the subtle visual patterns across an action, we propose a novel Video-Pose Network: VPN. The 2 key components of this VPN are a spatial embedding and an attention network. The spatial embedding projects the 3D poses and RGB cues in a common semantic space. This enables the action recognition framework to learn better spatio-temporal features exploiting both modalities. In order to discriminate similar actions, the attention network provides two functionalities - (i) an end-to-end learnable pose backbone exploiting the topology of human body, and (ii) a coupler to provide joint spatio-temporal attention weights across a video. Experiments show that VPN outperforms the state-of-the-art results for action classification on a large scale human activity dataset: NTU-RGB+D 120, its subset NTU-RGB+D 60, a real-world challenging human activity dataset: Toyota Smarthome and a small scale human-object interaction dataset Northwestern UCLA.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-classification-on-toyota-smarthome | VPN (RGB + Pose) | CS: 60.8 CV1: 43.8 CV2: 53.5 |
| action-recognition-in-videos-on-ntu-rgbd | VPN (RGB + Pose) | Accuracy (CS): 95.5 Accuracy (CV): 98.0 |
| action-recognition-in-videos-on-ntu-rgbd-120 | VPN (RGB + Pose) | Accuracy (Cross-Setup): 86.3 Accuracy (Cross-Subject): 87.8 |
| skeleton-based-action-recognition-on-n-ucla | VPN (RGB + Pose) | Accuracy: 93.5 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | VPN | Accuracy (Cross-Setup): 87.8 Accuracy (Cross-Subject): 86.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.