4 个月前

基于文本反转的零样本组合图像检索

基于文本反转的零样本组合图像检索

摘要

组合图像检索(Composed Image Retrieval, CIR)旨在根据由参考图像和描述两幅图像之间差异的相对字幕组成的查询来检索目标图像。现有的CIR方法依赖于监督学习,而为CIR标注数据集所需的高成本和大量工作阻碍了这些方法的广泛应用。在本研究中,我们提出了一项新任务——零样本CIR(Zero-Shot CIR, ZS-CIR),该任务的目标是在不需要标注训练数据集的情况下解决CIR问题。我们的方法命名为基于文本逆向转换的零样本组合图像检索(zero-Shot composEd imAge Retrieval with textuaL invErsion, SEARLE),该方法将参考图像的视觉特征映射到CLIP词嵌入空间中的一个伪词标记,并将其与相对字幕进行整合。为了支持ZS-CIR的研究,我们引入了一个名为“上下文中常见对象的组合图像检索”(Composed Image Retrieval on Common Objects in context, CIRCO)的开放域基准数据集,这是第一个包含每个查询多个真实标签的CIR数据集。实验结果表明,SEARLE在两个主要的CIR任务数据集FashionIQ和CIRR以及我们提出的CIRCO上表现出优于基线模型的性能。该数据集、代码和模型已在https://github.com/miccunifi/SEARLE公开发布。

代码仓库

miccunifi/searle
官方
pytorch
GitHub 中提及
miccunifi/circo
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
zero-shot-composed-image-retrieval-zs-cir-onSEARLE-XL (CLIP L/14)
mAP@10: 12.73
zero-shot-composed-image-retrieval-zs-cir-onSEARLE (CLIP B/32)
mAP@10: 9.94
zero-shot-composed-image-retrieval-zs-cir-on-1SEARLE
R@5: 53.42
zero-shot-composed-image-retrieval-zs-cir-on-1SEARLE-XL
R@5: 52.48
zero-shot-composed-image-retrieval-zs-cir-on-11SEARLE (CLIP B/32)
A-R@1: 14.4
zero-shot-composed-image-retrieval-zs-cir-on-11SEARLE (CLIP L/14)
A-R@1: 14.4
zero-shot-composed-image-retrieval-zs-cir-on-2SEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 32.71
zero-shot-composed-image-retrieval-zs-cir-on-2SEARLE-XL-OTI (CLIP L/14)
(Recall@10+Recall@50)/2: 37.76
zero-shot-composed-image-retrieval-zs-cir-on-2SEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 35.90
zero-shot-composed-image-retrieval-zs-cir-on-2SEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 32.39
zero-shot-composed-image-retrieval-zs-cir-on-3SEARLE-XL-OTI
R@10: 27.61
zero-shot-composed-image-retrieval-zs-cir-on-4SEARLE (CLIP B/32)
Actions Recall@5: 24.58
zero-shot-composed-image-retrieval-zs-cir-on-4SEARLE-OTI (CLIP B/32)
Actions Recall@5: 26.00
zero-shot-composed-image-retrieval-zs-cir-on-4SEARLE-XL-OTI (CLIP L/14)
Actions Recall@5: 31.43
zero-shot-composed-image-retrieval-zs-cir-on-4SEARLE-XL (CLIP L/14)
Actions Recall@5: 29.02
zero-shot-composed-image-retrieval-zs-cir-on-5SEARLE-OTI (CLIP B/32)
Average Recall: 12.77
zero-shot-composed-image-retrieval-zs-cir-on-5SEARLE-XL-OTI (CLIP B/32)
Average Recall: 20.42
zero-shot-composed-image-retrieval-zs-cir-on-5SEARLE-XL (CLIP L/14)
Average Recall: 21.54
zero-shot-composed-image-retrieval-zs-cir-on-5SEARLE (CLIP B/32)
Average Recall: 11.94
zero-shot-composed-image-retrieval-zs-cir-on-6SEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 12.77
zero-shot-composed-image-retrieval-zs-cir-on-6SEARLE-XL-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 20.42
zero-shot-composed-image-retrieval-zs-cir-on-6SEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 11.94
zero-shot-composed-image-retrieval-zs-cir-on-6SEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 21.54

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
基于文本反转的零样本组合图像检索 | 论文 | HyperAI超神经