4 个月前

iSEARLE: 改进零样本组合图像检索的文本反转技术

iSEARLE: 改进零样本组合图像检索的文本反转技术

摘要

给定一个包含参考图像和相对描述的查询,组合图像检索(CIR)旨在检索出在视觉上与参考图像相似且结合了相对描述中指定更改的目标图像。监督方法对人工标注数据集的依赖限制了其广泛应用。本文引入了一项新任务——零样本CIR(ZS-CIR),该任务无需标注训练数据即可解决CIR问题。我们提出了一种名为iSEARLE(改进的零样本组合图像检索与文本反转(improved zero-Shot composEd imAge Retrieval with textuaL invErsion))的方法,该方法涉及将参考图像的视觉信息映射到CLIP词嵌入空间中的伪词标记,并将其与相对描述相结合。为了促进ZS-CIR的研究,我们发布了一个开放领域的基准数据集,命名为CIRCO(上下文中常见物体的组合图像检索(Composed Image Retrieval on Common Objects in context)),这是第一个每个查询都带有多个真实标签和语义分类的CIR数据集。实验结果表明,iSEARLE在三个不同的CIR数据集——FashionIQ、CIRR和提出的CIRCO——以及两个额外的评估设置,即领域转换和对象组合中取得了最先进的性能。该数据集、代码和模型已在https://github.com/miccunifi/SEARLE 公开提供。

代码仓库

miccunifi/searle
pytorch
GitHub 中提及
miccunifi/circo
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-OTI (CLIP B/32)
mAP@10: 10.94
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE (CLIP B/32)
mAP@10: 11.24
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-XL-OTI (CLIP L/14)
mAP@10: 12.67
zero-shot-composed-image-retrieval-zs-cir-oniSEARLE-XL (CLIP L/14)
mAP@10: 13.61
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-XL-OTI (CLIP L/14)
R@5: 54.05
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-OTI (CLIP B/32)
R@5: 55.18
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE (CLIP B/32)
R@5: 55.69
zero-shot-composed-image-retrieval-zs-cir-on-1iSEARLE-XL (CLIP L/14)
R@5: 54.00
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 38.24
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 34.93
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE-XL-OTI (CLIP L/14)
(Recall@10+Recall@50)/2: 39.39
zero-shot-composed-image-retrieval-zs-cir-on-2iSEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 34.60
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-OTI (CLIP B/32)
Actions Recall@5: 26.63
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-XL-OTI (CLIP L/14)
Actions Recall@5: 32.55
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE-XL (CLIP L/14)
Actions Recall@5: 30.05
zero-shot-composed-image-retrieval-zs-cir-on-4iSEARLE (CLIP B/32)
Actions Recall@5: 26.40
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-XL (CLIP L/14)
Average Recall: 24.46
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-XL-OTI (CLIP L/14)
Average Recall: 22.59
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE (CLIP B/32)
Average Recall: 16.01
zero-shot-composed-image-retrieval-zs-cir-on-5iSEARLE-OTI (CLIP B/32)
Average Recall: 15.62
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE-XL (CLIP L/14)
(Recall@10+Recall@50)/2: 24.46
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE-OTI (CLIP B/32)
(Recall@10+Recall@50)/2: 15.62
zero-shot-composed-image-retrieval-zs-cir-on-6iSEARLE (CLIP B/32)
(Recall@10+Recall@50)/2: 16.01

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
iSEARLE: 改进零样本组合图像检索的文本反转技术 | 论文 | HyperAI超神经