Self Supervised Image Classification On

评估指标

Number of Params

Top 1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

			Paper Title
DINOv2+reg (ViT-g/14)	1100M	87.1	Vision Transformers Need Registers
DINOv2 (ViT-g/14 @448)	1100M	86.7%	DINOv2: Learning Robust Visual Features without Supervision
DINOv2 (ViT-g/14)	1100M	86.5%	DINOv2: Learning Robust Visual Features without Supervision
DINOv2 distilled (ViT-L/14)	307M	86.3%	DINOv2: Learning Robust Visual Features without Supervision
MIM-Refiner (D2V2-ViT-H/14)	632M	84.7%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
DINOv2 distilled (ViT-B/14)	85M	84.5%	DINOv2: Learning Robust Visual Features without Supervision
MIM-Refiner (MAE-ViT-2B/14)	1890M	84.5%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner (MAE-ViT-H/14	632M	83.7%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner (D2V2-ViT-L/16)	307M	83.5%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner (MAE-ViT-L/16)	307M	82.8%	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
iBOT (ViT-L/16) (IN22k)	307M	82.3%	iBOT: Image BERT Pre-Training with Online Tokenizer
MAE-CT (ViT-H/16)	632M	82.2%	Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Mugs (VIT-L/16)	307M	82.1%	Mugs: A Multi-Granular Self-Supervised Learning Framework
MAE-CT (ViT-L/16	307M	81.5%	Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
EsViT (Swin-B)	87M	81.3	Efficient Self-supervised Vision Transformers for Representation Learning
iBOT (ViT-L/16)	307M	81.3%	iBOT: Image BERT Pre-Training with Online Tokenizer
DINOv2 distilled (ViT-S/14)	21M	81.1%	DINOv2: Learning Robust Visual Features without Supervision
MoCo v3 (ViT-BN-L/7)	304M	81.0%	An Empirical Study of Training Self-Supervised Vision Transformers
EsViT(Swin-S)	49M	80.8	Efficient Self-supervised Vision Transformers for Representation Learning
MSN (ViT-L/7)	306M	80.7%	Masked Siamese Networks for Label-Efficient Learning

0 of 142 row(s) selected.