HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

{Keith C.C. Chan Yan Liu Bruce X.B. Yu}

Abstract

Indoor action recognition plays an important role in modernsociety, such as intelligent healthcare in large mobile cabinhospitals. With the wide usage of depth sensors like Kinect,multimodal information including skeleton and RGB modalitiesbrings a promising way to improve the performance.However, existing methods are either focusing on a singledata modality or failed to take the advantage of multiple datamodalities. In this paper, we propose a Teacher-Student MultimodalFusion (TSMF) model that fuses the skeleton andRGB modalities at the model level for indoor action recognition.In our TSMF, we utilize a teacher network to transferthe structural knowledge of the skeleton modality to astudent network for the RGB modality. With extensive experimentson two benchmarking datasets: NTU RGB+D andPKU-MMD, results show that the proposed TSMF consistentlyperforms better than state-of-the-art single modal andmultimodal methods. It also indicates that our TSMF couldnot only improve the accuracy of the student network but alsosignificantly improve the ensemble accuracy.

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-in-videos-on-ntu-rgbdTSMF (RGB + Pose)
Accuracy (CS): 92.5
Accuracy (CV): 97.4
action-recognition-in-videos-on-pku-mmdTSMF
X-Sub: 95.8
X-View: 97.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition | Papers | HyperAI