Video Object Detection On Imagenet Vid

评估指标

MAP

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
YOLOV++93.2Practical Video Object Detection via Feature Selection and Aggregation
DiffusionVID (Swin-B)92.5DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection-
Ours (Def. DETR + SwinB)91.3Objects do not disappear: Video object detection by single-frame object location anticipation
VSTAM91.1Video Sparse Transformer With Attention-Guided Memory for Video Object Detection-
TGBFormer (Swin B)90.3TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection-
TransVOD (Swin Base)90.1TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
PTSEFormer (ResNet-101)88.1PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
Ours (Def. DETR + R101)87.9Objects do not disappear: Video object detection by single-frame object location anticipation
YOLOV87.5YOLOV: Making Still Image Object Detectors Great at Video Object Detection
Ours (Faster RCNN + R101)87.2Objects do not disappear: Video object detection by single-frame object location anticipation
DiffusionVID (ResNet-101)87.1DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection-
DAFA-F (ResNeXt-101)85.9DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection-
ClipVID85.8Identity-Consistent Aggregation for Video Object Detection
HVRNet (ResNeXt101-32x4d)85.5Mining Inter-Video Proposal Relations for Video Object Detection-
MEGA (ResNeXt101)85.4Memory Enhanced Global-Local Aggregation for Video Object Detection
BoxMask(ResNeXt101)84.8BoxMask: Revisiting Bounding Box Supervision for Video Object Detection-
DAFA-F (ResNet-101)84.5DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection-
SELSA (ResNeXt-101)84.3Sequence Level Semantics Aggregation for Video Object Detection
Temporal ROI Align (ResNeXt101)84.3Temporal RoI Align for Video Object Recognition
REPP + SELSA (ResNet-101)84.2Robust and Efficient Post-Processing for Video Object Detection (REPP)-
0 of 33 row(s) selected.
Video Object Detection On Imagenet Vid | SOTA | HyperAI超神经