Command Palette
Search for a command to run...
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Alkin Benedikt ; Miklautz Lukas ; Hochreiter Sepp ; Brandstetter Johannes

Abstract
We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learningboost for pre-trained MIM models. MIM-Refiner is motivated by the insight thatstrong representations within MIM models generally reside in intermediatelayers. Accordingly, MIM-Refiner leverages multiple contrastive heads that areconnected to different intermediate layers. In each head, a modified nearestneighbor objective constructs semantic clusters that capture semanticinformation which improves performance on downstream tasks, includingoff-the-shelf and fine-tuning settings. The refinement process is short and simple - yet highly effective. Within afew epochs, we refine the features of MIM models from subpar tostate-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained withdata2vec 2.0 on ImageNet-1K, sets a new state-of-the-art in linear probing(84.7%) and low-shot classification among models that are pre-trained onImageNet-1K. MIM-Refiner efficiently combines the advantages of MIM and IDobjectives and compares favorably against previous state-of-the-art SSL modelson a variety of benchmarks such as low-shot classification, long-tailedclassification, clustering and semantic segmentation.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-clustering-on-imagenet | MIM-Refiner (D2V2-ViT-H/14) | ARI: 42.2 Accuracy: 67.3 NMI: 87.2 |
| image-clustering-on-imagenet | MIM-Refiner (MAE-ViT-H/14) | ARI: 45.5 Accuracy: 64.6 NMI: 85.3 |
| self-supervised-image-classification-on | MIM-Refiner (MAE-ViT-2B/14) | Number of Params: 1890M Top 1 Accuracy: 84.5% |
| self-supervised-image-classification-on | MIM-Refiner (MAE-ViT-H/14 | Number of Params: 632M Top 1 Accuracy: 83.7% |
| self-supervised-image-classification-on | MIM-Refiner (MAE-ViT-L/16) | Number of Params: 307M Top 1 Accuracy: 82.8% |
| self-supervised-image-classification-on | MIM-Refiner (D2V2-ViT-H/14) | Number of Params: 632M Top 1 Accuracy: 84.7% |
| self-supervised-image-classification-on | MIM-Refiner (D2V2-ViT-L/16) | Number of Params: 307M Top 1 Accuracy: 83.5% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.