| Meta Pseudo Labels (EfficientNet-B6-Wide) | 91.12% | - | Meta Pseudo Labels | |
| Meta Pseudo Labels (EfficientNet-L2) | 91.02% | - | Meta Pseudo Labels | |
| CvT-W24 (384 res, ImageNet-22k pretrain) | 90.6% | - | CvT: Introducing Convolutions to Vision Transformers | |
| Mixer-H/14- 448 (JFT-300M pre-train) | 90.18% | 409M | MLP-Mixer: An all-MLP Architecture for Vision | |