| PCT (swin-l, test set) | 94.3 | Human Pose as Compositional Tokens | |
| Cascade Feature Aggregation | 93.9 | Cascade Feature Aggregation for Human Pose Estimation | - |
| PCT (swin-b, test set) | 93.8 | Human Pose as Compositional Tokens | |
| Multi-Scale Structure-Aware Network | 92.1 | Multi-Scale Structure-Aware Network for Human Pose Estimation | - |
| Pyramid Residual Modules (PRMs) | 92.0 | Learning Feature Pyramids for Human Pose Estimation | |
| Stacked hourglass + Inception-resnet | 91.2 | Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation | |