
摘要
本文提出了一种端到端的半监督目标检测方法,与以往更为复杂的多阶段方法形成对比。该端到端训练框架在课程学习过程中逐步提升伪标签的质量,而日益精确的伪标签反过来又促进了目标检测模型的训练效果。在此框架下,我们进一步提出了两种简单但高效的技术:一是“软教师”机制,即根据教师网络为每个未标注边界框生成的分类得分,对分类损失进行加权;二是“框抖动”(box jittering)策略,用于筛选出可靠的伪边界框以优化回归分支的学习。在COCO基准测试中,该方法在不同标注比例(1%、5% 和 10%)下均显著优于此前的先进方法。此外,当标注数据量相对充足时,本方法同样表现出色。例如,在使用完整COCO训练集训练的基线检测器(mAP为40.9)基础上,仅利用COCO中的12.3万张未标注图像,即可实现+3.6 mAP的提升,达到44.5 mAP。在当前最先进的基于Swin Transformer的目标检测器(test-dev上达到58.9 mAP)上,该方法仍能显著提升检测精度至60.4 mAP(+1.5 mAP),同时将实例分割精度提升至52.4 mAP(+1.2 mAP)。进一步结合Object365预训练模型后,检测精度进一步提升至61.3 mAP,实例分割精度达到53.0 mAP,刷新了当前最优性能纪录,达到了新的技术水平。
代码仓库
microsoft/SoftTeacher
官方
pytorch
GitHub 中提及
amazon-research/bigdetection
pytorch
GitHub 中提及
JCZ404/Semi-DETR
pytorch
GitHub 中提及
hikvision-research/SSOD
pytorch
GitHub 中提及
hik-lab/ssod
pytorch
GitHub 中提及
amazon-science/bigdetection
pytorch
GitHub 中提及
lexisnexis-risk-open-source/ledetection
pytorch
GitHub 中提及
hattrickcr7/SoftTeacher
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| instance-segmentation-on-coco | Soft Teacher + Swin-L (HTC++, multi-scale) | mask AP: 53.0 |
| instance-segmentation-on-coco-minival | Soft Teacher + Swin-L(HTC++, single-scale) | mask AP: 51.9 |
| instance-segmentation-on-coco-minival | Soft Teacher + Swin-L(HTC++, multi-scale) | mask AP: 52.5 |
| object-detection-on-coco | Soft Teacher + Swin-L (HTC++, multi-scale) | box mAP: 61.3 |
| object-detection-on-coco-minival | Soft Teacher+Swin-L(HTC++, single scale) | box AP: 60.1 |
| object-detection-on-coco-minival | Soft Teacher + Swin-L (HTC++, multi-scale) | box AP: 60.7 |
| semi-supervised-object-detection-on-coco-1 | Soft Teacher + Swin-L(HTC++, multi-scale) | mAP: 20.46 |
| semi-supervised-object-detection-on-coco-10 | Soft Teacher | detector: FasterRCNN-Res50 mAP: 34.04 |
| semi-supervised-object-detection-on-coco-100 | Soft Teacher | mAP: 44.9 |
| semi-supervised-object-detection-on-coco-5 | Soft Teacher + Swin-L(HTC++, multi-scale) | mAP: 30.74 |