HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining

{Xi Peng Jiancheng Lv Peng Hu Yunfan Li Mouxing Yang Xinran Ma}

Abstract

The success of existing cross-modal retrieval (CMR) methods heavily rely on the assumption that the annotated cross-modal correspondence is faultless. In practice, however, the correspondence of some pairs would be inevitably contaminated during data collection or annotation, thus leading to the so-called Noisy Correspondence (NC) problem. To alleviate the influence of NC, we propose a novel method termed Consistency REfining And Mining (CREAM) by revealing and exploiting the difference between correspondence and consistency. Specifically, the correspondence and the consistency only be coincident for true positive and true negative pairs, while being distinct for false positive and false negative pairs. Based on the observation, CREAM employs a collaborative learning paradigm to detect and rectify the correspondence of positives, and a negative mining approach to explore and utilize the consistency. Thanks to the consistency refining and mining strategy of CREAM, the overfitting on the false positives could be prevented and the consistency rooted in the false negatives could be exploited, thus leading to a robust CMR method. Extensive experiments verify the effectiveness of our method on three image-text benchmarks including Flickr30K, MS-COCO, and Conceptual Captions. Furthermore, we adopt our method into the graph matching task and the results demonstrate the robustness of our method against fine-grained NC problem. The code is available on https://github.com/XLearning-SCU/2024-TIP-CREAM .

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1CREAM
Image-to-text R@1: 40.3
Image-to-text R@10: 77.1
Image-to-text R@5: 68.5
R-Sum: 372.6
Text-to-image R@1: 40.2
Text-to-image R@10: 78.3
Text-to-image R@5: 68.2
cross-modal-retrieval-with-noisy-2CREAM
Image-to-text R@1: 77.4
Image-to-text R@10: 97.3
Image-to-text R@5: 95.0
R-Sum: 502.3
Text-to-image R@1: 58.7
Text-to-image R@10: 89.8
Text-to-image R@5: 84.1
cross-modal-retrieval-with-noisy-3CREAM
Image-to-text R@1: 78.9
Image-to-text R@10: 98.6
Image-to-text R@5: 96.3
R-Sum: 523
Text-to-image R@1: 63.3
Text-to-image R@10: 95.8
Text-to-image R@5: 90.1
graph-matching-on-pascal-vocCREAM
matching accuracy: 0.814
graph-matching-on-spair-71kCREAM
matching accuracy: 0.851
graph-matching-on-willow-object-classCREAM
matching accuracy: 0.988

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining | Papers | HyperAI