5 months ago

Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity

Zhou Mu ; Stoffl Lucas ; Mathis Mackenzie Weygandt ; Mathis Alexander

Abstract

Frequent interactions between individuals are a fundamental challenge forpose estimation algorithms. Current pipelines either use an object detectortogether with a pose estimator (top-down approach), or localize all body partsfirst and then link them to predict the pose of individuals (bottom-up). Yet,when individuals closely interact, top-down methods are ill-defined due tooverlapping individuals, and bottom-up methods often falsely infer connectionsto distant bodyparts. Thus, we propose a novel pipeline called bottom-upconditioned top-down pose estimation (BUCTD) that combines the strengths ofbottom-up and top-down methods. Specifically, we propose to use a bottom-upmodel as the detector, which in addition to an estimated bounding box providesa pose proposal that is fed as condition to an attention-based top-down model.We demonstrate the performance and efficiency of our approach on animal andhuman pose estimation benchmarks. On CrowdPose and OCHuman, we outperformprevious state-of-the-art models by a significant margin. We achieve 78.5 AP onCrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over theprior art, respectively. Furthermore, we show that our method strongly improvesthe performance on multi-animal benchmarks involving fish and monkeys. The codeis available at https://github.com/amathislab/BUCTD

Code Repositories

amathislab/BUCTD

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
animal-pose-estimation-on-fish-100	HRNet-W48 + Faster R-CNN	mAP: 89.1
animal-pose-estimation-on-fish-100	BUCTD-preNet-W48 (DLCRNet)	mAP: 88.7
animal-pose-estimation-on-fish-100	BUCTD-preNet-W48 (CID-W32)	mAP: 88.0
animal-pose-estimation-on-marmoset-8k	BUCTD-preNet-W48 (CID-W32)	mAP: 93.3
animal-pose-estimation-on-marmoset-8k	BUCTD-CoAM-W48 (DLCRNet)	mAP: 91.6
animal-pose-estimation-on-marmoset-8k	CID-W32	mAP: 92.5
animal-pose-estimation-on-trimouse-161	BUCTD-CoAM-W48 (DLCRNet)	mAP: 99.1
animal-pose-estimation-on-trimouse-161	DLCRNet	mAP: 95.8
animal-pose-estimation-on-trimouse-161	CID-W32	mAP: 86.8
multi-person-pose-estimation-on-crowdpose	BUCTD-W48 (w/cond. input from PETR, and generative sampling)	AP Easy: 83.9 AP Hard: 72.3 AP Medium: 79.0 mAP @0.5:0.95: 78.5
pose-estimation-on-coco	BUCTD (PETR, with generative sampling)	APL: 83.7 APM: 74.2
pose-estimation-on-coco	BUCTD (PETR, with generative sampling)	AP: 77.8
pose-estimation-on-crowdpose	BUCTD-W48	AP: 72.9
pose-estimation-on-crowdpose	BUCTD-W48 (w/cond. input from PETR)	AP: 76.7
pose-estimation-on-crowdpose	BUCTD-W48 (w/cond. input from PETR, and generative sampling)	AP: 78.5 AP Easy: 83.9 AP Hard: 72.3 AP Medium: 79.0
pose-estimation-on-ochuman	BUCTD (CID-W32)	Test AP: 47.2 Validation AP: 47.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette