HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PLIP: Language-Image Pre-training for Person Representation Learning

Jialong Zuo; Jiahao Hong; Feng Zhang; Changqian Yu; Hanyu Zhou; Changxin Gao; Nong Sang; Jingdong Wang

PLIP: Language-Image Pre-training for Person Representation Learning

Abstract

Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases. 2) Image-guided Attributes Prediction, aims to mine fine-grained attribute information of the person body in the image; and 3) Identity-based Vision-Language Contrast, aims to correlate the cross-modal representations at the identity level rather than the instance level. Moreover, to implement our pre-train framework, we construct a large-scale person dataset with image-text pairs named SYNTH-PEDES by automatically generating textual annotations. We pre-train PLIP on SYNTH-PEDES and evaluate our models by spanning downstream person-centric tasks. PLIP not only significantly improves existing methods on all these tasks, but also shows great ability in the zero-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}

Code Repositories

zplusdragon/plip
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
nlp-based-person-retrival-on-cuhk-pedesPLIP-RN50
R@1: 69.23
R@10: 91.16
R@5: 85.84
person-re-identification-on-dukemtmc-reidPLIP-RN50-MGN
mAP: 81.7
person-re-identification-on-market-1501PLIP-RN50-ABDNet
mAP: 91.2
text-based-person-retrieval-on-icfg-pedesPLIP-RN50
R@1: 64.25
R@10: 86.32
R@5: 80.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PLIP: Language-Image Pre-training for Person Representation Learning | Papers | HyperAI