Command Palette
Search for a command to run...
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
Nenad Tomasev Ioana Bica Brian McWilliams Lars Buesing Razvan Pascanu Charles Blundell Jovana Mitrovic

Abstract
Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves $77.1\%$ top-$1$ accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute $+1.5\%$; on larger ResNet models, ReLICv2 achieves up to $80.6\%$ outperforming previous self-supervised approaches with margins up to $+2.3\%$. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-objectnet | SimCLR | Top-1 Accuracy: 14.6 |
| image-classification-on-objectnet | RELICv2 | Top-1 Accuracy: 25.9 |
| image-classification-on-objectnet | RELIC | Top-1 Accuracy: 23.8 |
| image-classification-on-objectnet | BYOL | Top-1 Accuracy: 23 |
| self-supervised-image-classification-on | ReLICv2 (ResNet101) | Number of Params: 44M Top 1 Accuracy: 78.7% |
| self-supervised-image-classification-on | ReLICv2 (ResNet-200 x2) | Number of Params: 250M Top 1 Accuracy: 80.6% |
| self-supervised-image-classification-on | ReLICv2 (ResNet-50) | Number of Params: 25M Top 1 Accuracy: 77.1% |
| self-supervised-image-classification-on | ReLICv2 (ResNet200) | Number of Params: 63M Top 1 Accuracy: 79.8% |
| self-supervised-image-classification-on | ReLICv2 (ResNet-50 4x) | Number of Params: 375M Top 1 Accuracy: 79.4% |
| self-supervised-image-classification-on | ReLICv2 (ResNet152) | Number of Params: 58M Top 1 Accuracy: 79.3% |
| self-supervised-image-classification-on | ReLICv2 (ResNet-50 x2) | Number of Params: 94M Top 1 Accuracy: 79% |
| semantic-segmentation-on-cityscapes-val | BYOL | mIoU: 74.6 |
| semantic-segmentation-on-cityscapes-val | ReLICv2 | mIoU: 75.2 |
| semantic-segmentation-on-pascal-voc-2012-val | ReLICv2 | mIoU: 77.9% |
| semantic-segmentation-on-pascal-voc-2012-val | BYOL | mIoU: 75.7% |
| semantic-segmentation-on-pascal-voc-2012-val | DetCon | mIoU: 77.3% |
| semi-supervised-image-classification-on-1 | RELICv2 | Top 1 Accuracy: 58.1% Top 5 Accuracy: 81.3 |
| semi-supervised-image-classification-on-2 | RELICv2 (ResNet-50) | Top 1 Accuracy: 72.4% Top 5 Accuracy: 91.2% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.