Image Classification On Objectnet

评估指标

Top-1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
CoCa82.7CoCa: Contrastive Captioners are Image-Text Foundation Models
LiT82.5LiT: Zero-Shot Transfer with Locked-image text Tuning
BASIC82.3Combined Scaling for Zero-shot Transfer Learning-
EVA-02-CLIP-E/14+79.6EVA-CLIP: Improved Training Techniques for CLIP at Scale
Baseline (ViT-G/14)79.03Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Model soups (ViT-G/14)78.52Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
MAWS (ViT-6.5B)77.9The effectiveness of MAE pre-pretraining for billion-scale pretraining
MAWS (ViT-2B)75.8The effectiveness of MAE pre-pretraining for billion-scale pretraining
MAWS (ViT-H)72.6The effectiveness of MAE pre-pretraining for billion-scale pretraining
CLIP72.3Learning Transferable Visual Models From Natural Language Supervision
ALIGN72.2Combined Scaling for Zero-shot Transfer Learning-
WiSE-FT72.1Robust fine-tuning of zero-shot models
ViT-e72.0PaLI: A Jointly-Scaled Multilingual Language-Image Model
ViT-G/1470.53Scaling Vision Transformers
SWAG (ViT H/14)69.5Revisiting Weakly Supervised Pre-Training of Visual Perception Models
NS (Eff.-L2)68.5Scaling Vision Transformers
RegNetY 128GF (Platt)64.3Revisiting Weakly Supervised Pre-Training of Visual Perception Models
LLE (ViT-H/14, MAE, Edge Aug)60.78A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
SEER (RegNet10B)60.2Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
ViT H/14 (Platt)60Revisiting Weakly Supervised Pre-Training of Visual Perception Models
0 of 106 row(s) selected.
Image Classification On Objectnet | SOTA | HyperAI超神经