Command Palette
Search for a command to run...
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Shu Xiujun ; Wen Wei ; Wu Haoqian ; Chen Keyu ; Song Yiran ; Qiao Ruizhi ; Ren Bo ; Wang Xiao

Abstract
Text-based person retrieval aims to find the query person based on a textualdescription. The key is to learn a common latent space mapping betweenvisual-textual modalities. To achieve this goal, existing works employsegmentation to obtain explicitly cross-modal alignments or utilize attentionto explore salient alignments. These methods have two shortcomings: 1) Labelingcross-modal alignments are time-consuming. 2) Attention methods can exploresalient cross-modal alignments but may ignore some subtle and valuable pairs.To relieve these issues, we introduce an Implicit Visual-Textual (IVT)framework for text-based person retrieval. Different from previous models, IVTutilizes a single network to learn representation for both modalities, whichcontributes to the visual-textual interaction. To explore the fine-grainedalignment, we further propose two implicit semantic alignment paradigms:multi-level alignment (MLA) and bidirectional mask modeling (BMM). The MLAmodule explores finer matching at sentence, phrase, and word levels, while theBMM module aims to mine \textbf{more} semantic alignments between visual andtextual modalities. Extensive experiments are carried out to evaluate theproposed IVT on public datasets, i.e., CUHK-PEDES, RSTPReID, and ICFG-PEDES.Even without explicit body part alignment, our approach still achievesstate-of-the-art performance. Code is available at:https://github.com/TencentYoutuResearch/PersonRetrieval-IVT.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| text-based-person-retrieval-with-noisy | IVT | Rank 10: 85.61 Rank-1: 58.59 Rank-5: 78.51 mAP: 57.19 mINP: 45.78 |
| text-based-person-retrieval-with-noisy-1 | IVT | Rank 1: 50.21 Rank-10: 76.18 Rank-5: 69.14 mAP: 34.72 mINP: 8.77 |
| text-based-person-retrieval-with-noisy-2 | IVT | Rank 1: 43.65 Rank 10: 75.70 Rank 5: 66.50 mAP: 37.22 mINP: 20.47 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.