Image Classification On Imagenet V2

评估指标

Top 1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Model soups (BASIC-L)84.63Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
ViT-e84.3PaLI: A Jointly-Scaled Multilingual Language-Image Model
Model soups (ViT-G/14)84.22Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
MAWS (ViT-6.5B)84.0The effectiveness of MAE pre-pretraining for billion-scale pretraining
SwinV2-G84.00%Swin Transformer V2: Scaling Up Capacity and Resolution
ViT-G/1483.33Scaling Vision Transformers
MAWS (ViT-2B)83.0The effectiveness of MAE pre-pretraining for billion-scale pretraining
MOAT-4 (IN-22K pretraining)81.5MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
SWAG (ViT H/14)81.1Revisiting Weakly Supervised Pre-Training of Visual Perception Models
MOAT-3 (IN-22K pretraining)80.6MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
MOAT-2 (IN-22K pretraining)79.3MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
MOAT-1 (IN-22K pretraining)78.4MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
SwinV2-B78.08Swin Transformer V2: Scaling Up Capacity and Resolution
VOLO-D578VOLO: Vision Outlooker for Visual Recognition
VOLO-D477.8VOLO: Vision Outlooker for Visual Recognition
CAIT-M36-44876.7--
SEER (RegNet10B)76.2Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
ResMLP-B24/8 22k74.2ResMLP: Feedforward networks for image classification with data-efficient training
ViT-B-36x173.9Three things everyone should know about Vision Transformers
ResMLP-B24/873.4ResMLP: Feedforward networks for image classification with data-efficient training
0 of 33 row(s) selected.
Image Classification On Imagenet V2 | SOTA | HyperAI超神经