| efficient adaptive ensembling | 96.879 | Efficient Adaptive Ensembling for Image Classification | - |
| CeiT-S (384 finetune resolution) | 94.1 | Incorporating Convolution Designs into Visual Transformers | |
| CeiT-T (384 finetune resolution) | 93 | Incorporating Convolution Designs into Visual Transformers | |