Command Palette
Search for a command to run...
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
Yang Shuyu ; Zhou Yinan ; Wang Yaxiong ; Wu Yujiao ; Zhu Li ; Zheng Zhedong

Abstract
In this paper, we introduce a large Multi-Attribute and Language Searchdataset for text-based person retrieval, called MALS, and explore thefeasibility of performing pre-training on both attribute recognition andimage-text matching tasks in one stone. In particular, MALS contains 1,510,330image-text pairs, which is about 37.5 times larger than prevailing CUHK-PEDES,and all images are annotated with 27 attributes. Considering the privacyconcerns and annotation costs, we leverage the off-the-shelf diffusion modelsto generate the dataset. To verify the feasibility of learning from thegenerated data, we develop a new joint Attribute Prompt Learning and TextMatching Learning (APTM) framework, considering the shared knowledge betweenattribute and text. As the name implies, APTM contains an attribute promptlearning stream and a text matching learning stream. (1) The attribute promptlearning leverages the attribute prompts for image-attribute alignment, whichenhances the text matching learning. (2) The text matching learning facilitatesthe representation learning on fine-grained details, and in turn, boosts theattribute prompt learning. Extensive experiments validate the effectiveness ofthe pre-training on MALS, achieving state-of-the-art retrieval performance viaAPTM on three challenging real-world benchmarks. In particular, APTM achieves aconsistent improvement of +6.96%, +7.68%, and +16.95% Recall@1 accuracy onCUHK-PEDES, ICFG-PEDES, and RSTPReid datasets by a clear margin, respectively.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| nlp-based-person-retrival-on-cuhk-pedes | APTM | R@1: 76.53 R@10: 94.15 R@5: 90.04 mAP: 66.91 |
| pedestrian-attribute-recognition-on-pa-100k | APTM | Accuracy: 80.17 |
| text-based-person-retrieval-on-icfg-pedes | APTM | R@1: 68.51 mAP: 41.22 |
| text-based-person-retrieval-on-rstpreid-1 | APTM | R@1: 67.50 R@10: 91.45 R@5: 85.70 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.