HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval

Lin Dixuan ; Peng Yixing ; Meng Jingke ; Zheng Wei-Shi

Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval

Abstract

Text-to-image person re-identification (ReID) aims to retrieve images of aperson based on a given textual description. The key challenge is to learn therelations between detailed information from visual and textual modalities.Existing works focus on learning a latent space to narrow the modality gap andfurther build local correspondences between two modalities. However, thesemethods assume that image-to-text and text-to-image associations aremodality-agnostic, resulting in suboptimal associations. In this work, we showthe discrepancy between image-to-text association and text-to-image associationand propose CADA: Cross-Modal Adaptive Dual Association that finely buildsbidirectional image-text detailed associations. Our approach features adecoder-based adaptive dual association module that enables full interactionbetween visual and textual modalities, allowing for bidirectional and adaptivecross-modal correspondence associations. Specifically, the paper proposes abidirectional association mechanism: Association of text Tokens to imagePatches (ATP) and Association of image Regions to text Attributes (ARA). Weadaptively model the ATP based on the fact that aggregating cross-modalfeatures based on mistaken associations will lead to feature distortion. Formodeling the ARA, since the attributes are typically the first distinguishingcues of a person, we propose to explore the attribute-level association bypredicting the masked text phrase using the related image region. Finally, welearn the dual associations between texts and images, and the experimentalresults demonstrate the superiority of our dual formulation. Codes will be madepublicly available.

Benchmarks

BenchmarkMethodologyMetrics
nlp-based-person-retrival-on-cuhk-pedesCADA
Rank-1: 78.37
Rank-10: 94.58
Rank-5: 91.57
mAP: 68.87
text-based-person-retrieval-on-icfg-pedesCADA
Rank-1: 67.81
Rank-10: 87.14
Rank-5: 82.34
mAP: 39.85
text-based-person-retrieval-on-rstpreid-1CADA
Rank-1: 69.6
Rank-10: 92.4
Rank-5: 86.75
mAP: 52.74

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval | Papers | HyperAI