HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang Xiaoting Qin Jue Zhang Jing Yu Gaopeng Gou Gang Xiong Qingwei Ling Saravan Rajmohan Dongmei Zhang Qi Wu

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Abstract

Composed Image Retrieval (CIR) aims to retrieve target images that closely resemble a reference image while integrating user-specified textual modifications, thereby capturing user intent more precisely. Existing training-free zero-shot CIR (ZS-CIR) methods often employ a two-stage process: they first generate a caption for the reference image and then use Large Language Models for reasoning to obtain a target description. However, these methods suffer from missing critical visual details and limited reasoning capabilities, leading to suboptimal retrieval performance. To address these challenges, we propose a novel, training-free one-stage method, One-Stage Reflective Chain-of-Thought Reasoning for ZS-CIR (OSrCIR), which employs Multimodal Large Language Models to retain essential visual information in a single-stage reasoning process, eliminating the information loss seen in two-stage methods. Our Reflective Chain-of-Thought framework further improves interpretative accuracy by aligning manipulation intent with contextual cues from reference images. OSrCIR achieves performance gains of 1.80% to 6.44% over existing training-free methods across multiple tasks, setting new state-of-the-art results in ZS-CIR and enhancing its utility in vision-language applications. Our code will be available at https://github.com/Pter61/osrcir2024/.

Code Repositories

Pter61/osrcir
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-composed-image-retrieval-zs-cir-onOSrCIR (CLIP L/14)
mAP@10: 25.33
zero-shot-composed-image-retrieval-zs-cir-onOSrCIR (CLIP G/14)
mAP@10: 31.14
zero-shot-composed-image-retrieval-zs-cir-onOSrCIR (CLIP B/32)
mAP@10: 19.17
zero-shot-composed-image-retrieval-zs-cir-on-1OSrCIR (CLIP L/14)
R@5: 57.68
zero-shot-composed-image-retrieval-zs-cir-on-1OSrCIR (CLIP G/14)
R@5: 67.25
zero-shot-composed-image-retrieval-zs-cir-on-1OSrCIR (CLIP B/32)
R@5: 54.54
zero-shot-composed-image-retrieval-zs-cir-on-11OSrCIR (CLIP B/32)
A-R@1: 17.4
zero-shot-composed-image-retrieval-zs-cir-on-11OSrCIR (CLIP L/14)
A-R@1: 17.9
zero-shot-composed-image-retrieval-zs-cir-on-11OSrCIR (CLIP G/14)
A-R@1: 19.6
zero-shot-composed-image-retrieval-zs-cir-on-2OSrCIR (CLIP B/32)
(Recall@10+Recall@50)/2: 42.87
zero-shot-composed-image-retrieval-zs-cir-on-2OSrCIR (CLIP G/14)
(Recall@10+Recall@50)/2: 47.34
zero-shot-composed-image-retrieval-zs-cir-on-2OSrCIR (CLIP L/14)
(Recall@10+Recall@50)/2: 42.82

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | Papers | HyperAI