Instance Segmentation On Coco Minival

评估指标

APL
APM
APS
mask AP

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Co-DETR74.659.738.956.6DETRs with Collaborative Hybrid Assignments Training
ViT-CoMer-L (Mask RCNN, DINOv2)---55.9ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions-
InternImage-H74.458.437.955.4InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
EVA72.058.437.655.0EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Mask Frozen-DETR72.958.4 37.254.9Mask Frozen-DETR: High Quality Instance Segmentation with One GPU-
MasK DINO (SwinL, multi-scale)---54.5Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)---54.2Vision Transformer Adapter for Dense Predictions
GLEE-Pro---54.2General Object Foundation Model for Images and Videos at Scale
SwinV2-G (HTC++)---53.7Swin Transformer V2: Scaling Up Capacity and Resolution
ViTDet, ViT-H Cascade (multiscale)---53.1Exploring Plain Vision Transformer Backbones for Object Detection
GLEE-Plus---53.0General Object Foundation Model for Images and Videos at Scale
Mask DINO (SwinL)---52.6Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)---52.5Vision Transformer Adapter for Dense Predictions
Soft Teacher + Swin-L(HTC++, multi-scale)---52.5End-to-End Semi-Supervised Object Detection with Soft Teacher
ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)---52.2Vision Transformer Adapter for Dense Predictions
ViTDet, ViT-H Cascade---52Exploring Plain Vision Transformer Backbones for Object Detection
Soft Teacher + Swin-L(HTC++, single-scale)---51.9End-to-End Semi-Supervised Object Detection with Soft Teacher
CBNetV2 (Dual-Swin-L HTC, multi-scale)---51.8CBNet: A Composite Backbone Network Architecture for Object Detection
Frozen Backbone, SwinV2-G-ext22K (HTC)---51.6Could Giant Pretrained Image Models Extract Universal Representations?-
CBNetV2 (Dual-Swin-L HTC, multi-scale)---51CBNet: A Composite Backbone Network Architecture for Object Detection
0 of 93 row(s) selected.
Instance Segmentation On Coco Minival | SOTA | HyperAI超神经