
摘要
本研究旨在将大规模预训练的视觉-语言模型(如对比语言-图像预训练模型CLIP)适配于各类监督设置下的目标重识别(Re-ID)任务,以提升其性能。尽管最近提出的CLIP-ReID方法通过提示学习(prompt learning)取得了令人瞩目的成果,但由于Re-ID任务中缺乏语义标签,提示学习的内在机制及其必要性仍不明确。本文首先系统分析了提示学习在CLIP-ReID中的作用,并揭示了其存在的局限性。基于上述研究发现,我们提出了一种简单而高效的方法,用于将CLIP适配至有监督的目标Re-ID任务。该方法直接通过原型对比学习(Prototypical Contrastive Learning, PCL)损失对CLIP的图像编码器进行微调,从而无需依赖提示学习。在行人与车辆Re-ID数据集上的实验结果表明,所提方法在性能上可与CLIP-ReID相媲美。此外,我们将基于PCL的CLIP微调策略进一步拓展至无监督场景,并在该设定下实现了当前最优的性能表现。
代码仓库
RikoLi/PCL-CLIP
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| person-re-identification-on-market-1501 | PCL-CLIP (L_pcl) | Rank-1: 96.1 Rank-5: 98.8 mAP: 91.0 |
| person-re-identification-on-market-1501 | PCL-CLIP (L_pcl+L_id) | Rank-1: 95.9 Rank-5: 98.5 mAP: 91.4 |
| person-re-identification-on-msmt17 | PCL-CLIP (L_pcl+L_id) | Rank-1: 89.8 Rank-10: 96.0 Rank-5: 94.7 mAP: 76.1 |
| person-re-identification-on-msmt17 | PCL-CLIP (L_pcl) | Rank-1: 89.2 Rank-10: 95.8 Rank-5: 94.7 mAP: 73.8 |
| unsupervised-person-re-identification-on-12 | PCL-CLIP (CC) | Rank-1: 77.9 Rank-10: 87.2 Rank-5: 85.2 mAP: 56.4 |
| unsupervised-person-re-identification-on-12 | PCL-CLIP (CAP) | Rank-1: 79.0 Rank-10: 91.1 Rank-5: 88.4 mAP: 53.6 |
| unsupervised-person-re-identification-on-12 | PCL-CLIP (O2CAP) | Rank-1: 84.9 Rank-10: 94.0 Rank-5: 92.0 mAP: 65.5 |
| unsupervised-person-re-identification-on-4 | PCL-CLIP (O2CAP) | MAP: 88.4 Rank-1: 94.8 Rank-10: 98.7 Rank-5: 98.0 |
| unsupervised-person-re-identification-on-4 | PCL-CLIP (CC) | MAP: 86.9 Rank-1: 94.2 Rank-10: 98.7 Rank-5: 97.8 |
| unsupervised-person-re-identification-on-4 | PCL-CLIP (CAP) | MAP: 87.4 Rank-1: 93.9 Rank-10: 98.5 Rank-5: 97.7 |