HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

Agnolucci Lorenzo ; Baldrati Alberto ; Bertini Marco ; Del Bimbo Alberto

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image
  Retrieval

Abstract

Given a query consisting of a reference image and a relative caption,Composed Image Retrieval (CIR) aims to retrieve target images visually similarto the reference one while incorporating the changes specified in the relativecaption. The reliance of supervised methods on labor-intensive manually labeleddatasets hinders their broad applicability. In this work, we introduce a newtask, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeledtraining dataset. We propose an approach named iSEARLE (improved zero-ShotcomposEd imAge Retrieval with textuaL invErsion) that involves mapping thevisual information of the reference image into a pseudo-word token in CLIPtoken embedding space and combining it with the relative caption. To fosterresearch on ZS-CIR, we present an open-domain benchmarking dataset named CIRCO(Composed Image Retrieval on Common Objects in context), the first CIR datasetwhere each query is labeled with multiple ground truths and a semanticcategorization. The experimental results illustrate that iSEARLE obtainsstate-of-the-art performance on three different CIR datasets -- FashionIQ,CIRR, and the proposed CIRCO -- and two additional evaluation settings, namelydomain conversion and object composition. The dataset, the code, and the modelare publicly available at https://github.com/miccunifi/SEARLE.

Code Repositories

miccunifi/searle
pytorch
Mentioned in GitHub
miccunifi/circo
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-OTI (CLIP B/32)
mAP@10: 10.94
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE (CLIP B/32)
mAP@10: 11.24
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-XL-OTI (CLIP L/14)
mAP@10: 12.67
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-XL (CLIP L/14)
mAP@10: 13.61
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-XL-OTI (CLIP L/14)
R@5: 54.05
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-OTI (CLIP B/32)
R@5: 55.18
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE (CLIP B/32)
R@5: 55.69
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-XL (CLIP L/14)
R@5: 54.00
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 38.24
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 34.93
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-XL-OTI (CLIP L/14)
(Recall@10+Recall@50)/2: 39.39
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 34.60
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-OTI (CLIP B/32)
Actions Recall@5: 26.63
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-XL-OTI (CLIP L/14)
Actions Recall@5: 32.55
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-XL (CLIP L/14)
Actions Recall@5: 30.05
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE (CLIP B/32)
Actions Recall@5: 26.40
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-XL (CLIP L/14)
Average Recall: 24.46
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-XL-OTI (CLIP L/14)
Average Recall: 22.59
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE (CLIP B/32)
Average Recall: 16.01
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-OTI (CLIP B/32)
Average Recall: 15.62
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 24.46
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 15.62
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 16.01

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | Papers | HyperAI