HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Joint Mixing Data Augmentation for Skeleton-based Action Recognition

{Zengfu Wang Linhua Xiang}

Abstract

Skeleton-based action recognition is beneficial for understanding human behavior in videos, and thus has received much attention in recent years as an important research area in action recognition. Current research focuses on designing more advanced algorithms to better extract spatio-temporal information from skeleton data. However, due to the small amount of data in the existing skeleton dataset and the lack of effective data augmentation methods, it is easy to lead to overfitting in model training. To address this challenge, we propose a mix-based data augmentation method, Joint Mixing Data Augmentation (JMDA), which can generally improve the effectiveness and robustness of various skeleton-based action recognition algorithms. In terms of spatial information, we introduce SpatialMix (SM), a method that projects the original 3D skeleton discrete information into a 2D space. Then, SM mixes the projected spatial information between two random samples during the training process to achieve the spatial-based mixing data augmentation. Concerning temporal information, we propose TemporalMix (TM). Leveraging the temporal continuity in skeleton data, we perform a temporal resize operation on the original skeleton data, and then merge two random samples during training to achieve the temporal-based mixed data augmentation. Additionally, we analyze the Feature Mismatch (FM) problem caused by introducing mix-based data augmentation into skeleton data. Then we propose a new data preprocessing method called Feature Alignment (FA) to effectively address this problem and improve model performance. Moreover, we propose a novel training pipeline, Joint Training Strategy (JTS), which combines multiple mix-based data augmentation methods for further improvement of model performance. Specifically, our proposed JMDA is plug-and-play and widely applicable to skeleton-based action recognition models. At the same time, the application of JMDA does not increase the model parameters and there is almost no additional training cost. We conduct extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets to demonstrate the effectiveness and robustness of the proposed JMDA on several mainstream skeleton-based action recognition algorithms.

Benchmarks

BenchmarkMethodologyMetrics
skeleton-based-action-recognition-on-ntu-rgbdJMDA (based on Skeleton MixFormer)
Accuracy (CS): 93.7
Accuracy (CV): 97.2
skeleton-based-action-recognition-on-ntu-rgbd-1JMDA (based on Skeleton MixFormer)
Accuracy (Cross-Setup): 91.9
Accuracy (Cross-Subject): 90.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Joint Mixing Data Augmentation for Skeleton-based Action Recognition | Papers | HyperAI