Command Palette
Search for a command to run...
Son Hyeongseok ; Jung Sangil ; Lee Solae ; Kim Seongeun ; Park Seung-In ; Yoo ByungIn

Abstract
Human is one of the most essential classes in visual recognition tasks suchas detection, segmentation, and pose estimation. Although much effort has beenput into individual tasks, multi-task learning for these three tasks has beenrarely studied. In this paper, we explore a compact multi-task networkarchitecture that maximally shares the parameters of the multiple tasks viaobject-centric learning. To this end, we propose a novel query design to encodethe human instance information effectively, called human-centric query (HCQ).HCQ enables for the query to learn explicit and structural information of humanas well such as keypoints. Besides, we utilize HCQ in prediction heads of thetarget tasks directly and also interweave HCQ with the deformable attention inTransformer decoders to exploit a well-learned object-centric representation.Experimental results show that the proposed multi-task network achievescomparable accuracy to state-of-the-art task-specific models in humandetection, segmentation, and pose estimation task, while it consumes lesscomputational costs.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-instance-segmentation-on-ochuman | Mask2Former | AP: 27.8 |
| human-instance-segmentation-on-ochuman | HCQNet | AP: 27.3 |
| human-instance-segmentation-on-ochuman | BaseNet-DPS | AP: 25.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.