Zero Shot Composed Image Retrieval Zs Cir On

评估指标

mAP@10

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
MMRet-MLLM43.4MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
MMRet-Large (CLIP L/14)40.2MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
SEIZE (CLIP G/14 & GPT-4o)37.23Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval-
MagicLens (CoCa L)35.4MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MMRet-Base (CLIP B/16)35.0MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
IP-CIR + LDRE (CLIP G/14)34.26Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy-
SEIZE (CLIP G/14)33.77Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval-
LDRE (CLIP G/14)32.24LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval-
MagicLens (CoCa B)32.0MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
OSrCIR (CLIP G/14)31.14Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
MagicLens (CLIP L)30.8MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
CoVR-BLIP-229.55CoVR-2: Automatic Data Construction for Composed Video Retrieval
ImageScope (CLIP-ViT-L/14)28.36ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
CIReVL (CLIP G/14)27.59Vision-by-Language for Training-Free Compositional Image Retrieval
IP-CIR + LDRE (CLIP L/14)27.41Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy-
SEIZE (CLIP L/14)25.82Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval-
OSrCIR (CLIP L/14)25.33Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
LDRE (CLIP L/14)24.03LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval-
MagicLens (CLIP B)23.8MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
RTD + LinCIR (CLIP G/14)22.29An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
0 of 42 row(s) selected.
Zero Shot Composed Image Retrieval Zs Cir On | SOTA | HyperAI超神经