HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features

{Alberto del Bimbo Tiberio Uricchio Marco Bertini Alberto Baldrati}

Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features

Abstract

In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR), an image is combined with a text that provides information regarding user intentions and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage, we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.

Benchmarks

BenchmarkMethodologyMetrics
image-retrieval-on-cirrCLIP4Cir (v2)
(Recall@5+Recall_subset@1)/2: 69.09
image-retrieval-on-fashion-iqCLIP4Cir (v2)
(Recall@10+Recall@50)/2: 50.03
image-retrieval-on-lascoCLIP4CIR
Recall@1 (%): 4.01

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features | Papers | HyperAI