HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CoVR-2: Automatic Data Construction for Composed Video Retrieval

Ventura Lucas ; Yang Antoine ; Schmid Cordelia ; Varol Gül

CoVR-2: Automatic Data Construction for Composed Video Retrieval

Abstract

Composed Image Retrieval (CoIR) has recently gained popularity as a task thatconsiders both text and image queries together, to search for relevant imagesin a database. Most CoIR approaches require manually annotated datasets,comprising image-text-image triplets, where the text describes a modificationfrom the query image to the target image. However, manual curation of CoIRtriplets is expensive and prevents scalability. In this work, we insteadpropose a scalable automatic dataset creation methodology that generatestriplets given video-caption pairs, while also expanding the scope of the taskto include composed video retrieval (CoVR). To this end, we mine paired videoswith a similar caption from a large database, and leverage a large languagemodel to generate the corresponding modification text. Applying thismethodology to the extensive WebVid2M collection, we automatically constructour WebVid-CoVR dataset, resulting in 1.6 million triplets. Moreover, weintroduce a new benchmark for CoVR with a manually annotated evaluation set,along with baseline results. We further validate that our methodology isequally applicable to image-caption pairs, by generating 3.3 million CoIRtraining triplets using the Conceptual Captions dataset. Our model builds onBLIP-2 pretraining, adapting it to composed video (or image) retrieval, andincorporates an additional caption retrieval loss to exploit extra supervisionbeyond the triplet. We provide extensive ablations to analyze the designchoices on our new CoVR benchmark. Our experiments also demonstrate thattraining a CoVR model on our datasets effectively transfers to CoIR, leading toimproved state-of-the-art performance in the zero-shot setup on the CIRR,FashionIQ, and CIRCO benchmarks. Our code, datasets, and models are publiclyavailable at https://imagine.enpc.fr/~ventural/covr/.

Code Repositories

lucas-ventura/CoVR
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
composed-image-retrieval-coir-on-cirr-1CoVR-BLIP
(Recall@5+Recall_subset@1)/2: 76.81
composed-image-retrieval-coir-on-cirr-1CoVR-BLIP-2
R@1: 50.43
R@5: 81.08
composed-image-retrieval-coir-on-fashion-iqCoVR-BLIP-2
(Recall@10+Recall@50)/2: 60.57
R@10: 49.96
R@50: 71.17
composed-video-retrieval-covr-on-covrCoVR-BLIP
R@5: 79.93
composed-video-retrieval-covr-on-covrBLIP-2
R@1: 59.82
zero-shot-composed-image-retrieval-zs-cir-onCoVR-BLIP-2
mAP@10: 29.55
zero-shot-composed-image-retrieval-zs-cir-on-1CoVR-BLIP-2
R@1: 43.74
R@10: 83.95
R@5: 73.61
R@50: 96.1
zero-shot-composed-image-retrieval-zs-cir-on-2CoVR-BLIP-2
(Recall@10+Recall@50)/2: 48.3
R@10: 38.15
R@50: 58.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CoVR-2: Automatic Data Construction for Composed Video Retrieval | Papers | HyperAI