HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

Junfa Liu Juan Rojas Zhijun Liang Yihui Li Yisheng Guan

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

Abstract

Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation. Code, video, and supplementary information is available at: \href{http://www.juanrojas.net/gast/}{http://www.juanrojas.net/gast/}

Code Repositories

fabro66/GAST-Net-3DPoseEstimation
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-human36mGAST (T=27)
Average MPJPE (mm): 46.2
PA-MPJPE: 36
3d-human-pose-estimation-on-human36mGAST (T=81)
Average MPJPE (mm): 45.7
PA-MPJPE: 35.9
3d-human-pose-estimation-on-human36mGAST (T=9)
Average MPJPE (mm): 49
PA-MPJPE: 37.4
3d-human-pose-estimation-on-humaneva-iGAST
Mean Reconstruction Error (mm): 21.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video | Papers | HyperAI