Command Palette
Search for a command to run...
Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity
Zhou Mu ; Stoffl Lucas ; Mathis Mackenzie Weygandt ; Mathis Alexander

Abstract
Frequent interactions between individuals are a fundamental challenge forpose estimation algorithms. Current pipelines either use an object detectortogether with a pose estimator (top-down approach), or localize all body partsfirst and then link them to predict the pose of individuals (bottom-up). Yet,when individuals closely interact, top-down methods are ill-defined due tooverlapping individuals, and bottom-up methods often falsely infer connectionsto distant bodyparts. Thus, we propose a novel pipeline called bottom-upconditioned top-down pose estimation (BUCTD) that combines the strengths ofbottom-up and top-down methods. Specifically, we propose to use a bottom-upmodel as the detector, which in addition to an estimated bounding box providesa pose proposal that is fed as condition to an attention-based top-down model.We demonstrate the performance and efficiency of our approach on animal andhuman pose estimation benchmarks. On CrowdPose and OCHuman, we outperformprevious state-of-the-art models by a significant margin. We achieve 78.5 AP onCrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over theprior art, respectively. Furthermore, we show that our method strongly improvesthe performance on multi-animal benchmarks involving fish and monkeys. The codeis available at https://github.com/amathislab/BUCTD
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| animal-pose-estimation-on-fish-100 | HRNet-W48 + Faster R-CNN | mAP: 89.1 |
| animal-pose-estimation-on-fish-100 | BUCTD-preNet-W48 (DLCRNet) | mAP: 88.7 |
| animal-pose-estimation-on-fish-100 | BUCTD-preNet-W48 (CID-W32) | mAP: 88.0 |
| animal-pose-estimation-on-marmoset-8k | BUCTD-preNet-W48 (CID-W32) | mAP: 93.3 |
| animal-pose-estimation-on-marmoset-8k | BUCTD-CoAM-W48 (DLCRNet) | mAP: 91.6 |
| animal-pose-estimation-on-marmoset-8k | CID-W32 | mAP: 92.5 |
| animal-pose-estimation-on-trimouse-161 | BUCTD-CoAM-W48 (DLCRNet) | mAP: 99.1 |
| animal-pose-estimation-on-trimouse-161 | DLCRNet | mAP: 95.8 |
| animal-pose-estimation-on-trimouse-161 | CID-W32 | mAP: 86.8 |
| multi-person-pose-estimation-on-crowdpose | BUCTD-W48 (w/cond. input from PETR, and generative sampling) | AP Easy: 83.9 AP Hard: 72.3 AP Medium: 79.0 mAP @0.5:0.95: 78.5 |
| pose-estimation-on-coco | BUCTD (PETR, with generative sampling) | APL: 83.7 APM: 74.2 |
| pose-estimation-on-coco | BUCTD (PETR, with generative sampling) | AP: 77.8 |
| pose-estimation-on-crowdpose | BUCTD-W48 | AP: 72.9 |
| pose-estimation-on-crowdpose | BUCTD-W48 (w/cond. input from PETR) | AP: 76.7 |
| pose-estimation-on-crowdpose | BUCTD-W48 (w/cond. input from PETR, and generative sampling) | AP: 78.5 AP Easy: 83.9 AP Hard: 72.3 AP Medium: 79.0 |
| pose-estimation-on-ochuman | BUCTD (CID-W32) | Test AP: 47.2 Validation AP: 47.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.