HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Weihua Chen; Xianzhe Xu; Jian Jia; Hao luo; Yaohua Wang; Fan Wang; Rong Jin; Xiuyu Sun

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Abstract

Human-centric visual tasks have attracted increasing research attention due to their widespread applications. In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. We call this method SOLIDER, a Semantic cOntrollable seLf-supervIseD lEaRning framework. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, we note that different downstream tasks always require different ratios of semantic information and appearance information. For example, human parsing requires more semantic information, while person re-identification needs more appearance information for identification purpose. So a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller. After the model is trained, users can send values to the controller to produce representations with different ratios of semantic information, which can fit different needs of downstream tasks. Finally, SOLIDER is verified on six downstream human-centric visual tasks. It outperforms state of the arts and builds new baselines for these tasks. The code is released in https://github.com/tinyvision/SOLIDER.

Code Repositories

DengpanFu/LUPerson
pytorch
Mentioned in GitHub
hasanirtiza/Pedestron
pytorch
Mentioned in GitHub
tinyvision/SOLIDER
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
pedestrian-attribute-recognition-on-pa-100kSOLIDER
Accuracy: 86.38
pedestrian-detection-on-citypersonsSOLIDER
Heavy MR^-2: 39.4
Reasonable MR^-2: 9.7
person-re-identification-on-market-1501SOLIDER (RK)
Rank-1: 96.7
mAP: 95.6
person-re-identification-on-market-1501SOLIDER
Rank-1: 96.9
mAP: 93.9
person-re-identification-on-msmt17SOLIDER (with re-ranking)
Rank-1: 91.7
mAP: 86.5
person-re-identification-on-msmt17SOLIDER (without re-ranking)
Rank-1: 90.7
mAP: 77.1
person-search-on-cuhk-sysuSOLIDER
MAP: 95.5
Top-1: 95.8
person-search-on-prwSOLIDER
Top-1: 86.7
mAP: 59.8
pose-estimation-on-cocoSOLIDER (swin-B)
AP: 76.6
AR: 81.5
semantic-segmentation-on-lip-valSOLIDER
mIoU: 60.50%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks | Papers | HyperAI