HyperAIHyperAI

Command Palette

Search for a command to run...

3D Human Pose Estimation with Spatial and Temporal Transformers

Ce Zheng Sijie Zhu Matias Mendieta Taojiannan Yang Chen Chen Zhengming Ding

Abstract

Transformer architectures have become the model of choice in natural languageprocessing and are now being introduced into computer vision tasks such asimage classification, object detection, and semantic segmentation. However, inthe field of human pose estimation, convolutional architectures still remaindominant. In this work, we present PoseFormer, a purely transformer-basedapproach for 3D human pose estimation in videos without convolutionalarchitectures involved. Inspired by recent developments in vision transformers,we design a spatial-temporal transformer structure to comprehensively model thehuman joint relations within each frame as well as the temporal correlationsacross frames, then output an accurate 3D human pose of the center frame. Wequantitatively and qualitatively evaluate our method on two popular andstandard benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experimentsshow that PoseFormer achieves state-of-the-art performance on both datasets.Code is available at \url{https://github.com/zczcwh/PoseFormer}


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
3D Human Pose Estimation with Spatial and Temporal Transformers | Papers | HyperAI