Self Supervised Image Classification On 1

评估指标

Number of Params
Top 1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
DINOv2 (ViT-g/14, 448)1100M88.9%DINOv2: Learning Robust Visual Features without Supervision
PercMAE (ViT-L, dVAE)307M88.6%Improving Visual Representation Learning through Perceptual Understanding
DINOv2 (ViT-g/14)1100M88.5%DINOv2: Learning Robust Visual Features without Supervision
PeCo(ViT-H/14, 448)632M88.3%PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
PercMAE (ViT-L)307M88.1%Improving Visual Representation Learning through Perceptual Understanding
dBOT (ViT-H/14)632M88.0%Exploring Target Representations for Masked Autoencoders
MAE (ViT-H/14, 448)632M87.8%Masked Autoencoders Are Scalable Vision Learners
iBOT(ViT-L/16, 512)307M87.8%iBOT: Image BERT Pre-Training with Online Tokenizer
MAE + AugSub finetune (ViT-H/14)632M87.2%Masking meets Supervision: A Strong Learning Alliance
SimMIM (SwinV2-H, 512)658M87.1%SimMIM: A Simple Framework for Masked Image Modeling
MAE (ViT-H/14)-86.9%Masked Autoencoders Are Scalable Vision Learners
iBOT(ViT-L/16)307M86.6%iBOT: Image BERT Pre-Training with Online Tokenizer
TEC_MAE (ViT-L/16, 224)-86.5%Towards Sustainable Self-supervised Learning
BEiT-L (ViT)307M86.3%BEiT: BERT Pre-Training of Image Transformers
CAE (ViT-L/16)307M86.3%Context Autoencoder for Self-Supervised Representation Learning
MIRL (ViT-B-48)341M86.2%Masked Image Residual Learning for Scaling Deeper Vision Transformers
MAE + AugSub finetune (ViT-L/16)304M86.1%Masking meets Supervision: A Strong Learning Alliance
SparK (ConvNeXt-Large, 384)198M86.0%Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
BootMAE(ViT-L)307M85.9%Bootstrapped Masked Autoencoders for Vision BERT Pretraining
SEER (Regnet10B)10000M85.8%Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
0 of 65 row(s) selected.
Self Supervised Image Classification On 1 | SOTA | HyperAI超神经