Priya GoyalQuentin DuvalIsaac SeesselMathilde CaronIshan MisraLevent SagunArmand JoulinPiotr Bojanowski

摘要
判别式自监督学习允许在任意随机选取的互联网图像集合上训练模型,并可能恢复有助于区分图像的关键信息。当应用于ImageNet数据集时,该方法可生成以物体为中心的特征表示,在多数以物体为中心的下游任务中,其性能可与监督学习所得特征相媲美。在本研究中,我们探讨了这一能力是否足以从全球范围内多样且无边界的大规模图像集合中,学习到更具代表性与显著性的信息。为此,我们在数十亿张未经任何预处理、且不预先设定学习目标的随机图像上训练模型。为避免在大规模数据上出现欠拟合,我们将模型规模扩展至高达100亿个参数的密集架构。我们在超过50个基准测试上系统性地评估并验证了模型性能,涵盖公平性、分布偏移下的鲁棒性、地理多样性、细粒度识别、图像复制检测以及多个图像分类数据集。实验结果表明,该模型不仅能够有效捕捉语义信息,还能从视觉内容中学习到艺术风格、地理位置等显著特征,以及基于视觉的多语言词嵌入。更重要的是,我们发现,此类模型相较于监督学习模型,或在以物体为中心的数据集(如ImageNet)上训练的模型,表现出更强的鲁棒性、更高的公平性、更低的有害性与偏见水平。
代码仓库
facebookresearch/vissl
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| action-classification-on-kinetics-700 | SEER (RegNet10B) | Top-1 Accuracy: 51.9 |
| domain-generalization-on-imagenet-a | SEER (RegNet10B) | Top-1 accuracy %: 52.7 |
| domain-generalization-on-imagenet-r | SEER (RegNet10B) | Top-1 Error Rate: 43.9 |
| domain-generalization-on-imagenet-sketch | SEER (RegNet10B) | Top-1 accuracy: 45.6 |
| fine-grained-image-classification-on-caltech | SEER (RegNet10B - linear eval) | Accuracy: 91.0 Top-1 Error Rate: 9.0% |
| fine-grained-image-classification-on-fgvc | SEER (RegNet10B) | Accuracy: 54.82% |
| fine-grained-image-classification-on-oxford-1 | SEER (RegNet10B) | Accuracy: 85.3% |
| fine-grained-image-classification-on-stanford | SEER (RegNet10B) | Accuracy: 68.03% |
| fine-grained-image-classification-on-sun397 | SEER (RegNet10B - linear eval) | Accuracy: 80.0 |
| image-classification-on-cifar-10 | SEER (RegNet10B) | Percentage correct: 90 |
| image-classification-on-cifar-100 | SEER (RegNet10B) | Percentage correct: 81.53 |
| image-classification-on-clevr-count | SEER (RegNet10B) | Top 1 Accuracy: 89.28 |
| image-classification-on-clevr-count | SEER (RegNetY-128GF) | Top 1 Accuracy: 87.98 |
| image-classification-on-clevr-dist | SEER (RegNet10B) | Top 1 Accuracy: 74.98 |
| image-classification-on-clevr-dist | SEER (RegNetY-128GF) | Top 1 Accuracy: 72.67 |
| image-classification-on-dtd | SEER (RegNet10B - linear eval) | Accuracy: 80.5 |
| image-classification-on-eurosat | SEER (RegNet10B - linear eval) | Accuracy (%): 97.5 |
| image-classification-on-flowers-102 | SEER (RegNet10B) | Accuracy: 96.3 |
| image-classification-on-food-101-1 | SEER (RegNet10B - linear eval) | Accuracy (%): 90.3 |
| image-classification-on-imagenet | SEER (RG-10B) | Number of params: 10000M Top 1 Accuracy: 85.8% |
| image-classification-on-imagenet-real | SEER (RegNet10B) | Accuracy: 89.8% Params: 10000M |
| image-classification-on-imagenet-v2 | SEER (RegNet10B) | Top 1 Accuracy: 76.2 |
| image-classification-on-inaturalist-2018 | SEER (RegNet10B - finetuned - 384px) | Top-1 Accuracy: 84.7% |
| image-classification-on-kitti-dist | SEER (RegNet10B) | Top 1 Accuracy: 78.34 |
| image-classification-on-mnist | SEER (RegNet10B) | Accuracy: 99.42 Percentage error: 0.58 |
| image-classification-on-objectnet | SEER (RegNet10B) | Top-1 Accuracy: 60.2 |
| image-classification-on-places205 | SEER (RegNet10B - finetuned - 384px) | Top 1 Accuracy: 69.0 |
| image-classification-on-resisc45 | ResNet50 (ImageNet-supervised) | Top 1 Accuracy: 88.56 |
| image-classification-on-resisc45 | DeiT-B/16 | Top 1 Accuracy: 92.48 |
| image-classification-on-resisc45 | SimCLR-v2 (ResNet152-w3 + SK) | Top 1 Accuracy: 89.77 |
| image-classification-on-resisc45 | MoCo-v3 (ViT-B/16) | Top 1 Accuracy: 93.35 |
| image-classification-on-resisc45 | SwAV (ResNet50-w5) | Top 1 Accuracy: 94.73 |
| image-classification-on-resisc45 | MoCo-v2 (ResNet50) | Top 1 Accuracy: 85.4 |
| image-classification-on-resisc45 | SEER (RegNet10B) | Top 1 Accuracy: 95.61 |
| image-classification-on-resisc45 | CLIP (ViT-B/16) | Top 1 Accuracy: 92.7 |
| image-classification-on-resisc45 | DINO (DeiT-B/16) | Top 1 Accuracy: 93.97 |
| image-classification-on-stl-10 | SEER (RegNet10B) | PARAMS: 10000M Percentage correct: 97.3 |
| image-classification-on-svhn | SEER (RegNet10B) | Percentage error: 13.6 |
| meme-classification-on-hateful-memes | SEER (RegNet10B) | ROC-AUC: 0.734 |
| self-supervised-image-classification-on-1 | SEER (Regnet10B) | Number of Params: 10000M Top 1 Accuracy: 85.8% |
| semi-supervised-image-classification-on-1 | SEER (RegNet10B) | Top 1 Accuracy: 62.4% |
| semi-supervised-image-classification-on-2 | SEER (RegNet10B) | Top 1 Accuracy: 78.8% |
| traffic-sign-recognition-on-gtsrb | SEER (RegNet10B) | Accuracy: 90.71% |