HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models

Liu Zheyuan ; Rodriguez-Opazo Cristian ; Teney Damien ; Gould Stephen

Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models

Abstract

We extend the task of composed image retrieval, where an input query consistsof an image and short textual description of how to modify the image. Existingmethods have only been applied to non-complex images within narrow domains,such as fashion products, thereby limiting the scope of study on in-depthvisual reasoning in rich image and language contexts. To address this issue, wecollect the Compose Image Retrieval on Real-life images (CIRR) dataset, whichconsists of over 36,000 pairs of crowd-sourced, open-domain images withhuman-generated modifying text. To extend current methods to the open-domain,we propose CIRPLANT, a transformer based model that leverages rich pre-trainedvision-and-language (V&L) knowledge for modifying visual features conditionedon natural language. Retrieval is then done by nearest neighbor lookup on themodified features. We demonstrate that with a relatively simple architecture,CIRPLANT outperforms existing methods on open-domain images, while matchingstate-of-the-art accuracy on the existing narrow datasets, such as fashion.Together with the release of CIRR, we believe this work will inspire furtherresearch on composed image retrieval.

Code Repositories

Cuberick-Orion/CIRR
Official
Mentioned in GitHub
naver/artemis
pytorch
Mentioned in GitHub
Cuberick-Orion/CIRPLANT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-retrieval-on-cirrCIRPLANT
(Recall@5+Recall_subset@1)/2: 45.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models | Papers | HyperAI