HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Yuxing Chen Renshu Gu Ouhan Huang Gangyong Jia

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

Abstract

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-cmu-panopticVTP
Average MPJPE (mm): 17.62
3d-multi-person-pose-estimation-on-campusVTP
Mean mAP: 80.1
PCP3D: 96.3
3d-multi-person-pose-estimation-on-shelfVTP
MPJPE: 56.3
PCP3D: 97.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation | Papers | HyperAI