
摘要
本研究聚焦于通过探索一种新型损失函数、批量大小与新型正则化方法之间的相互作用,学习用于图像检索的深层视觉表征模型。由于召回率(recall)这一评估指标在检索任务中通常不可导,因此无法直接通过梯度下降等方法对其进行优化。为此,本文提出了一种针对召回率的可微分代理损失函数(surrogate loss)。通过采用一种规避GPU内存硬件限制的实现方式,该方法能够以极大规模的批量进行训练,这对于在完整检索数据库上计算评估指标而言至关重要。此外,该方法还引入了一种高效的Mixup正则化策略,该策略作用于成对的标量相似度上,可虚拟地进一步扩大有效批量大小。实验结果表明,当应用于深度度量学习时,该方法在多个图像检索基准测试中取得了当前最优的性能表现。特别是在实例级识别任务中,该方法优于那些基于平均精度(average precision)近似值进行训练的同类方法。
代码仓库
yash0307/recallatk
官方
pytorch
GitHub 中提及
yash0307/RecallatK_surrogate
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-retrieval-on-inaturalist | Recall@k Surrogate loss (ResNet-50) | R@1: 71.8 R@16: 91.9 R@32: 94.3 R@5: 84.7 |
| image-retrieval-on-inaturalist | Recall@k Surrogate loss (ViT-B/16) | R@1: 83.0 R@16: 95.9 R@32: 97.2 R@5: 92.1 |
| metric-learning-on-cars196 | Recall@k Surrogate loss (ResNet-50) | R@1: 88.3 |
| metric-learning-on-cars196 | Recall@k Surrogate loss (ViT-B/16) | R@1: 89.5 |
| metric-learning-on-stanford-online-products-1 | Recall@k Surrogate Loss (ViT-B/16) | R@1: 88.0 |
| metric-learning-on-stanford-online-products-1 | Recall@k Surrogate Loss (ResNet-50) | R@1: 82.7 |
| metric-learning-on-stanford-online-products-1 | Recall@k Surrogate Loss (ViT-B/32) | R@1: 85.1 |
| vehicle-re-identification-on-vehicleid-large | Recall@k Surrogate loss (ViT-B/16) | Rank-1: 94.7 Rank-5: 97.1 |
| vehicle-re-identification-on-vehicleid-large | Recall@k Surrogate loss (ResNet-50) | Rank-1: 93.8 Rank-5: 96.6 |
| vehicle-re-identification-on-vehicleid-medium | Recall@k Surrogate loss (ResNet-50) | Rank-1: 94.6 Rank-5: 96.9 |
| vehicle-re-identification-on-vehicleid-medium | Recall@k Surrogate loss (ViT-B/16) | Rank-1: 95.2 Rank-5: 97.2 |
| vehicle-re-identification-on-vehicleid-small | Recall@k Surrogate loss (ResNet-50) | Rank-1: 95.7 Rank-5: 97.9 |
| vehicle-re-identification-on-vehicleid-small | Recall@k Surrogate loss (ViT-B/16) | Rank-1: 96.2 Rank-5: 98.0 |