5 months ago

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Liu Ziyu ; Zhang Hongwen ; Chen Zhenghao ; Wang Zhiyong ; Ouyang Wanli

Abstract

Spatial-temporal graphs have been widely used by skeleton-based actionrecognition algorithms to model human action dynamics. To capture robustmovement patterns from these graphs, long-range and multi-scale contextaggregation and spatial-temporal dependency modeling are critical aspects of apowerful feature extractor. However, existing methods have limitations inachieving (1) unbiased long-range joint relationship modeling under multi-scaleoperators and (2) unobstructed cross-spacetime information flow for capturingcomplex spatial-temporal dependencies. In this work, we present (1) a simplemethod to disentangle multi-scale graph convolutions and (2) a unifiedspatial-temporal graph convolutional operator named G3D. The proposedmulti-scale aggregation scheme disentangles the importance of nodes indifferent neighborhoods for effective long-range modeling. The proposed G3Dmodule leverages dense cross-spacetime edges as skip connections for directinformation propagation across the spatial-temporal graph. By coupling theseproposals, we develop a powerful feature extractor named MS-G3D based on whichour model outperforms previous state-of-the-art methods on three large-scaledatasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Code Repositories

kenziyuliu/ms-g3d

Official

pytorch

Mentioned in GitHub

kennymckormick/pyskl

pytorch

Mentioned in GitHub

metrics-lab/st-fmri

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-action-recognition-on-assembly101	MS-G3D	Actions Top-1: 28.7 Object Top-1: 36.3 Verbs Top-1: 65.7
action-recognition-on-h2o-2-hands-and-objects	MS-G3D	Actions Top-1: 50.83 Hand Pose: 3D Object Label: No Object Pose: No RGB: No
skeleton-based-action-recognition-on-kinetics	MS-G3D	Accuracy: 38.0
skeleton-based-action-recognition-on-ntu-rgbd	MS-G3D Net	Accuracy (CS): 91.5 Accuracy (CV): 96.2
skeleton-based-action-recognition-on-ntu-rgbd-1	MS-G3D Net	Accuracy (Cross-Setup): 88.4% Accuracy (Cross-Subject): 86.9%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

Liu Ziyu ; Zhang Hongwen ; Chen Zhenghao ; Wang Zhiyong ; Ouyang Wanli

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters