HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Cross-modal Active Complementary Learning with Self-refining Correspondence

Qin Yang ; Sun Yuan ; Peng Dezhong ; Zhou Joey Tianyi ; Peng Xi ; Hu Peng

Cross-modal Active Complementary Learning with Self-refining
  Correspondence

Abstract

Recently, image-text matching has attracted more and more attention fromacademia and industry, which is fundamental to understanding the latentcorrespondence across visual and textual modalities. However, most existingmethods implicitly assume the training pairs are well-aligned while ignoringthe ubiquitous annotation noise, a.k.a noisy correspondence (NC), therebyinevitably leading to a performance drop. Although some methods attempt toaddress such noise, they still face two challenging problems: excessivememorizing/overfitting and unreliable correction for NC, especially under highnoise. To address the two problems, we propose a generalized Cross-modal RobustComplementary Learning framework (CRCL), which benefits from a novel ActiveComplementary Loss (ACL) and an efficient Self-refining CorrespondenceCorrection (SCC) to improve the robustness of existing methods. Specifically,ACL exploits active and complementary learning losses to reduce the risk ofproviding erroneous supervision, leading to theoretically and experimentallydemonstrated robustness against NC. SCC utilizes multiple self-refiningprocesses with momentum correction to enlarge the receptive field forcorrecting correspondences, thereby alleviating error accumulation andachieving accurate and stable corrections. We carry out extensive experimentson three image-text benchmarks, i.e., Flickr30K, MS-COCO, and CC152K, to verifythe superior robustness of our CRCL against synthetic and real-world noisycorrespondences.

Code Repositories

qinyang79/crcl
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-with-noisy-1CRCL
Image-to-text R@1: 41.8
Image-to-text R@10: 76.5
Image-to-text R@5: 67.4
R-Sum: 373.7
Text-to-image R@1: 41.6
Text-to-image R@10: 78.4
Text-to-image R@5: 68.0
cross-modal-retrieval-with-noisy-2CRCL
Image-to-text R@1: 77.9
Image-to-text R@10: 98.3
Image-to-text R@5: 95.4
R-Sum: 507.8
Text-to-image R@1: 60.9
Text-to-image R@10: 90.6
Text-to-image R@5: 84.7
cross-modal-retrieval-with-noisy-3CRCL
Image-to-text R@1: 79.6
Image-to-text R@10: 98.7
Image-to-text R@5: 96.1
R-Sum: 525.6
Text-to-image R@1: 64.7
Text-to-image R@10: 95.9
Text-to-image R@5: 90.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cross-modal Active Complementary Learning with Self-refining Correspondence | Papers | HyperAI