8 months ago

Abstract

Human skeleton, as a compact representation of human action, has receivedincreasing attention in recent years. Many skeleton-based action recognitionmethods adopt graph convolutional networks (GCN) to extract features on top ofhuman skeletons. Despite the positive results shown in previous works,GCN-based methods are subject to limitations in robustness, interoperability,and scalability. In this work, we propose PoseC3D, a new approach toskeleton-based action recognition, which relies on a 3D heatmap stack insteadof a graph sequence as the base representation of human skeletons. Compared toGCN-based methods, PoseC3D is more effective in learning spatiotemporalfeatures, more robust against pose estimation noises, and generalizes better incross-dataset settings. Also, PoseC3D can handle multiple-person scenarioswithout additional computation cost, and its features can be easily integratedwith other modalities at early fusion stages, which provides a great designspace to further boost the performance. On four challenging datasets, PoseC3Dconsistently obtains superior performance, when used alone on skeletons and incombination with the RGB modality.

Source PDF View Code