Semantic Segmentation On Ade20K

评估指标

GFLOPs
Params (M)
Validation mIoU

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
ONE-PEACE-150063.0ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)-131062.9Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H4635131062.9InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3-190062.8Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA-107462.3EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)-57161.5Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G-300061.4Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)-243961.0Reversible Column Networks
MasK DINO (SwinL, multi-scale)-22360.8Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)-57160.5Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)-108060.2DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)--59.9Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former--59.35SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)--58.5Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)-45158.4Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)-33058.4Representation Separation for Semantic Segmentation with Vision Transformers-
SeMask (SeMask Swin-L MSFaPN-Mask2Former)--58.2SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)--58.2SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)--58.2SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)--58.1Dilated Neighborhood Attention Transformer
0 of 230 row(s) selected.
Semantic Segmentation On Ade20K | SOTA | HyperAI超神经