Image Classification On Imagenet

评估指标

Hardware Burden
Number of params
Operations per network pass
Top 1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
CoCa (finetuned)2100M91.0%CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa (finetuned)-2100M-91.0%CoCa: Contrastive Captioners are Image-Text Foundation Models
Model soups (BASIC-L)-2440M-90.98%Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Model soups (ViT-G/14)-1843M-90.94%Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
DaViT-G-1437M-90.4%DaViT: Dual Attention Vision Transformers
DaViT-H-362M-90.2%DaViT: Dual Attention Vision Transformers
Meta Pseudo Labels (EfficientNet-L2)95040G480M90.2%Meta Pseudo Labels
SwinV2-G-3000M-90.17%Swin Transformer V2: Scaling Up Capacity and Resolution
MAWS (ViT-6.5B)-6500M-90.1%The effectiveness of MAE pre-pretraining for billion-scale pretraining
InternImage-DCNv3-G (M3I Pre-training)-3000M-90.1%InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Florence-CoSwin-H-893M-90.05%Florence: A New Foundation Model for Computer Vision
RevCol-H-2158M-90.0%Reversible Column Networks
Meta Pseudo Labels (EfficientNet-B6-Wide)-390M-90%Meta Pseudo Labels
MAWS (ViT-2B)-2000M-89.8%The effectiveness of MAE pre-pretraining for billion-scale pretraining
EVA-1000M-89.7%EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
M3I Pre-training (InternImage-H)---89.6%Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
ViT-L/16 (384res, distilled from ViT-22B)-307M-89.6%Scaling Vision Transformers to 22 Billion Parameters
InternImage-H-1080M-89.6%InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
MaxViT-XL (512res, JFT)---89.53%MaxViT: Multi-Axis Vision Transformer
AIMv2-3B (448 res)---89.5%Multimodal Autoregressive Pre-training of Large Vision Encoders
0 of 1058 row(s) selected.
Image Classification On Imagenet | SOTA | HyperAI超神经