4 个月前

自监督视觉变换器中的新兴特性

自监督视觉变换器中的新兴特性

摘要

在本文中,我们探讨了自监督学习是否为视觉变换器(Vision Transformer,简称ViT)提供了相较于卷积网络(Convolutional Networks,简称Convnets)更为突出的新特性。除了将自监督方法应用于该架构特别有效这一事实外,我们还做出了以下观察:首先,自监督ViT特征包含关于图像语义分割的显式信息,而这种信息在监督下的ViT或Convnets中并不那么明显。其次,这些特征也是优秀的k近邻分类器,在小型ViT上达到了ImageNet数据集78.3%的Top-1准确率。我们的研究还强调了动量编码器、多裁剪训练以及使用小尺寸补丁对于ViT的重要性。我们将这些发现整合到一个简单的自监督方法中,称为DINO,可以将其解释为一种无标签的自我蒸馏形式。通过线性评估,我们展示了DINO与ViT之间的协同作用,在ImageNet数据集上使用ViT-Base模型达到了80.1%的Top-1准确率。

代码仓库

基准测试

基准方法指标
image-classification-on-omnibenchmarkDINO
Average Top-1 Accuracy: 38.9
image-retrieval-on-roxford-hardDino
mAP: 24.3
image-retrieval-on-roxford-mediumDino
mAP: 51.5
image-retrieval-on-rparis-hardDino
mAP: 51.6
image-retrieval-on-rparis-mediumDino
mAP: 75.3
self-supervised-image-classification-onDINO (ViT-B/16)
Number of Params: 85M
Top 1 Accuracy: 78.2%
self-supervised-image-classification-onDINO (ViT-B/8)
Number of Params: 80M
Top 1 Accuracy: 80.1%
self-supervised-image-classification-onDINO (ViT-S/8)
Number of Params: 21M
Top 1 Accuracy: 79.7%
self-supervised-image-classification-onDINO (ResNet-50)
Number of Params: 24M
Top 1 Accuracy: 75.3%
self-supervised-image-classification-onDINO (xcit_medium_24_p8)
Number of Params: 84M
Top 1 Accuracy: 80.3%
self-supervised-image-classification-onDINO (ViT-S/16)
Number of Params: 21M
Top 1 Accuracy: 77.0%
self-supervised-image-classification-on-1DINO (ViT-B/16)
Number of Params: 85M
Top 1 Accuracy: 82.8%
video-object-segmentation-on-davis-2017DINO (ViT-B/8, ImageNet retrain)
Ju0026F: 71.4
visual-place-recognition-on-17-placesDINO
Recall@1: 61.82
visual-place-recognition-on-baidu-mallDINO
Recall@1: 48.30
visual-place-recognition-on-gardens-pointDINO
Recall@1: 78.50
visual-place-recognition-on-hawkinsDINO
Recall@1: 46.61
visual-place-recognition-on-laurel-cavernsDINO
Recall@1: 41.07
visual-place-recognition-on-mid-atlanticDINO
Recall@1: 27.72
visual-place-recognition-on-nardo-airDINO
Recall@1: 57.75
visual-place-recognition-on-nardo-air-rDINO
Recall@1: 84.51
visual-place-recognition-on-oxford-robotcar-4DINO
Recall@1: 15.71
visual-place-recognition-on-pittsburgh-30kDINO
Recall@1: 70.13
visual-place-recognition-on-st-luciaDINO
Recall@1: 45.22
visual-place-recognition-on-vp-airDINO
Recall@1: 24.02

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
自监督视觉变换器中的新兴特性 | 论文 | HyperAI超神经