HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

Liu Zheyuan ; Sun Weixuan ; Teney Damien ; Gould Stephen

Candidate Set Re-ranking for Composed Image Retrieval with Dual
  Multi-modal Encoder

Abstract

Composed image retrieval aims to find an image that best matches a givenmulti-modal user query consisting of a reference image and text pair. Existingmethods commonly pre-compute image embeddings over the entire corpus andcompare these to a reference image embedding modified by the query text at testtime. Such a pipeline is very efficient at test time since fast vectordistances can be used to evaluate candidates, but modifying the reference imageembedding guided only by a short textual description can be difficult,especially independent of potential candidates. An alternative approach is toallow interactions between the query and every possible candidate, i.e.,reference-text-candidate triplets, and pick the best from the entire set.Though this approach is more discriminative, for large-scale datasets thecomputational cost is prohibitive since pre-computation of candidate embeddingsis no longer possible. We propose to combine the merits of both schemes using atwo-stage model. Our first stage adopts the conventional vector distancingmetric and performs a fast pruning among candidates. Meanwhile, our secondstage employs a dual-encoder architecture, which effectively attends to theinput triplet of reference-text-candidate and re-ranks the candidates. Bothstages utilize a vision-and-language pre-trained network, which has provenbeneficial for various downstream tasks. Our method consistently outperformsstate-of-the-art approaches on standard benchmarks for the task. Ourimplementation is available athttps://github.com/Cuberick-Orion/Candidate-Reranking-CIR.

Code Repositories

Cuberick-Orion/Bi-Blip4CIR
pytorch
Mentioned in GitHub
Cuberick-Orion/Candidate-Reranking-CIR
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-retrieval-on-cirrCandidate Set Re-ranking
(Recall@5+Recall_subset@1)/2: 80.9
Recall@10: 89.78
image-retrieval-on-fashion-iqCandidate Set Re-ranking
(Recall@10+Recall@50)/2: 62.15
Recall@10: 51.17

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder | Papers | HyperAI