Object Detection On Coco 2017

评估指标

mAP

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
UniRepLKNet-XL++56.4UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
UniRepLKNet-L++55.8UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
UniRepLKNet-B++54.8UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
UniRepLKNet-S++54.3UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
MixMIM-L54.1MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
UniRepLKNet-S53UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
MixMIM-B52.2MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
UniRepLKNet-T51.7UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
BiFormer-B (IN1k pretrain, MaskRCNN 12ep)48.6BiFormer: Vision Transformer with Bi-Level Routing Attention
DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)48.5DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
BiFormer-S (IN1k pretrain, MaskRCNN 12ep)47.8BiFormer: Vision Transformer with Bi-Level Routing Attention
DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)47.5DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
DeBiFormer-B (IN1k pretrain, Retina)47.1DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
DeBiFormer-S (IN1k pretrain, Retina)45.6DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
YOLO-Drone35.45YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective-
DyHead (SAP)-Stochastic Subsampling With Average Pooling-
Lpixel-Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
MaxViT-T-MaxViT: Multi-Axis Vision Transformer
DAT-T++-DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
MaxViT-S-MaxViT: Multi-Axis Vision Transformer
0 of 24 row(s) selected.
Object Detection On Coco 2017 | SOTA | HyperAI超神经