Command Palette
Search for a command to run...
Zhaowei Cai Avinash Ravichandran Paolo Favaro Manchen Wang Davide Modolo Rahul Bhotika Zhuowen Tu Stefano Soatto

Abstract
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semi-supervised-image-classification-on-1 | Semi-ViT (ViT-Base) | Top 1 Accuracy: 71% |
| semi-supervised-image-classification-on-1 | Semi-ViT (ViT-Huge) | Top 1 Accuracy: 80% Top 5 Accuracy: 93.1 |
| semi-supervised-image-classification-on-1 | Semi-ViT (ViT-Large) | Top 1 Accuracy: 77.3% |
| semi-supervised-image-classification-on-2 | Semi-ViT (ViT-Large) | Top 1 Accuracy: 83.3% |
| semi-supervised-image-classification-on-2 | Semi-ViT (ViT-Small) | Top 1 Accuracy: 77.1% |
| semi-supervised-image-classification-on-2 | Semi-ViT (ViT-Base) | Top 1 Accuracy: 79.7% |
| semi-supervised-image-classification-on-2 | Semi-ViT (ViT-Huge) | Top 1 Accuracy: 84.3% Top 5 Accuracy: 96.6% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.