Image Classification On Imagenet Real

评估指标

Accuracy
Params

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Baseline (ViT-G/14)91.78%-Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Model soups (ViT-G/14)91.20%1843MModel soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViTAE-H (MAE, 512)91.2%644MViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Meta Pseudo Labels (EfficientNet-B6-Wide)91.12%-Meta Pseudo Labels
MAWS (ViT-6.5B)91.1%-The effectiveness of MAE pre-pretraining for billion-scale pretraining
TokenLearner L/8 (24+11)91.05%460MTokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Model soups (BASIC-L)91.03%2440MModel soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Meta Pseudo Labels (EfficientNet-L2)91.02%-Meta Pseudo Labels
FixEfficientNet-L290.9%480MFixing the train-test resolution discrepancy: FixEfficientNet
MAWS (ViT-2B)90.9%-The effectiveness of MAE pre-pretraining for billion-scale pretraining
ViT-G/1490.81%-Scaling Vision Transformers
MAWS (ViT-H)90.8%-The effectiveness of MAE pre-pretraining for billion-scale pretraining
SWAG (RegNetY 128GF)90.7%-Revisiting Weakly Supervised Pre-Training of Visual Perception Models
VOLO-D590.6%-VOLO: Vision Outlooker for Visual Recognition
CvT-W24 (384 res, ImageNet-22k pretrain)90.6%-CvT: Introducing Convolutions to Vision Transformers
EfficientNet-L290.55%480MSelf-training with Noisy Student improves ImageNet classification
BiT-L90.54%928MBig Transfer (BiT): General Visual Representation Learning
VOLO-D490.5%-VOLO: Vision Outlooker for Visual Recognition
CAIT-M36-44890.2%---
Mixer-H/14- 448 (JFT-300M pre-train)90.18%409MMLP-Mixer: An all-MLP Architecture for Vision
0 of 57 row(s) selected.
Image Classification On Imagenet Real | SOTA | HyperAI超神经