
摘要
预训练表征的迁移能够显著提升深度神经网络在视觉任务中的样本效率,并简化超参数调优过程。我们重新审视了在大规模监督数据集上进行预训练,随后在目标任务上微调模型的经典范式。通过扩大预训练规模,并提出一种简洁的训练方法,我们称之为大迁移(Big Transfer,简称BiT),在超过20个数据集上取得了优异的性能表现。BiT在极为广泛的数据规模范围内均表现出色——从每类仅1个样本到总计100万样本的场景均能有效工作。在ImageNet(ILSVRC-2012)数据集上,BiT达到87.5%的Top-1准确率;在CIFAR-10上达到99.4%;在包含19个任务的视觉任务适应基准(Visual Task Adaptation Benchmark, VTAB)上达到76.3%。在小样本场景下,BiT在每类仅10个样本的情况下,于ILSVRC-2012上仍取得76.8%的准确率,在CIFAR-10上达到97.0%。我们对影响迁移性能的关键组件进行了深入分析,揭示了其成功背后的机制。
代码仓库
sayakpaul/FunMatch-Distillation
tf
GitHub 中提及
batsresearch/taglets
pytorch
GitHub 中提及
SoojungYang/supervised_pretraining_GN_WS
tf
GitHub 中提及
bethgelab/InDomainGeneralizationBenchmark
pytorch
GitHub 中提及
google-research/big_transfer
官方
jax
GitHub 中提及
hw666666666666/BigTransfer
mindspore
sayakpaul/A-Barebones-Image-Retrieval-System
tf
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| fine-grained-image-classification-on-oxford | BiT-M (ResNet) | Accuracy: 99.30% Top-1 Error Rate: 0.70 |
| fine-grained-image-classification-on-oxford | BiT-L (ResNet) | Accuracy: 99.63% Top-1 Error Rate: 0.37 |
| fine-grained-image-classification-on-oxford-2 | BiT-L (ResNet) | Accuracy: 96.62 Top-1 Error Rate: 3.38% |
| fine-grained-image-classification-on-oxford-2 | BiT-M (ResNet) | Accuracy: 94.47 Top-1 Error Rate: 5.53% |
| image-classification-on-cifar-10 | BiT-L (ResNet) | Percentage correct: 99.37 |
| image-classification-on-cifar-10 | BiT-M (ResNet) | Percentage correct: 98.91 |
| image-classification-on-cifar-100 | BiT-M (ResNet) | Percentage correct: 92.17 |
| image-classification-on-cifar-100 | BiT-L (ResNet) | Percentage correct: 93.51 |
| image-classification-on-flowers-102 | BiT-L (ResNet) | Accuracy: 99.63 |
| image-classification-on-flowers-102 | BiT-M (ResNet) | Accuracy: 99.30 |
| image-classification-on-imagenet | BiT-M (ResNet) | Number of params: 928M Top 1 Accuracy: 85.39% |
| image-classification-on-imagenet | BiT-L (ResNet) | Top 1 Accuracy: 87.54% Top 5 Accuracy: 98.46 |
| image-classification-on-imagenet-real | BiT-L | Accuracy: 90.54% Params: 928M |
| image-classification-on-imagenet-real | BiT-M | Accuracy: 89.02% |
| image-classification-on-objectnet | BiT-L (ResNet-152x4) | Top-1 Accuracy: 58.7 Top-5 Accuracy: 80 |
| image-classification-on-objectnet | BiT-M (ResNet-152x4) | Top-1 Accuracy: 47.0 Top-5 Accuracy: 69 |
| image-classification-on-objectnet | BiT-S (ResNet-152x4) | Top-1 Accuracy: 36.0 Top-5 Accuracy: 57 |
| image-classification-on-objectnet-bounding | BiT-S (ResNet) | Top 5 Accuracy: 64.4 |
| image-classification-on-objectnet-bounding | BiT-M (ResNet) | Top 5 Accuracy: 76.0 |
| image-classification-on-objectnet-bounding | BiT-L (ResNet) | Top 5 Accuracy: 85.1 |
| image-classification-on-omnibenchmark | BiT-M | Average Top-1 Accuracy: 40.4 |
| image-classification-on-vtab-1k-1 | BiT-S | Top-1 Accuracy: 66.9 |
| image-classification-on-vtab-1k-1 | BiT-L | Top-1 Accuracy: 76.3 |
| image-classification-on-vtab-1k-1 | BiT-L (50 hypers/task) | Top-1 Accuracy: 78.72 |
| image-classification-on-vtab-1k-1 | BiT-M | Top-1 Accuracy: 70.6 |