Command Palette
Search for a command to run...
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Liu Zheyuan ; Sun Weixuan ; Teney Damien ; Gould Stephen

Abstract
Composed image retrieval aims to find an image that best matches a givenmulti-modal user query consisting of a reference image and text pair. Existingmethods commonly pre-compute image embeddings over the entire corpus andcompare these to a reference image embedding modified by the query text at testtime. Such a pipeline is very efficient at test time since fast vectordistances can be used to evaluate candidates, but modifying the reference imageembedding guided only by a short textual description can be difficult,especially independent of potential candidates. An alternative approach is toallow interactions between the query and every possible candidate, i.e.,reference-text-candidate triplets, and pick the best from the entire set.Though this approach is more discriminative, for large-scale datasets thecomputational cost is prohibitive since pre-computation of candidate embeddingsis no longer possible. We propose to combine the merits of both schemes using atwo-stage model. Our first stage adopts the conventional vector distancingmetric and performs a fast pruning among candidates. Meanwhile, our secondstage employs a dual-encoder architecture, which effectively attends to theinput triplet of reference-text-candidate and re-ranks the candidates. Bothstages utilize a vision-and-language pre-trained network, which has provenbeneficial for various downstream tasks. Our method consistently outperformsstate-of-the-art approaches on standard benchmarks for the task. Ourimplementation is available athttps://github.com/Cuberick-Orion/Candidate-Reranking-CIR.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-retrieval-on-cirr | Candidate Set Re-ranking | (Recall@5+Recall_subset@1)/2: 80.9 Recall@10: 89.78 |
| image-retrieval-on-fashion-iq | Candidate Set Re-ranking | (Recall@10+Recall@50)/2: 62.15 Recall@10: 51.17 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.