
摘要
构建数据高效且能够处理稀有物体类别的实例分割模型,是计算机视觉领域的一项重要挑战。利用数据增强技术是应对该挑战的有前景方向。本文系统研究了用于实例分割的“复制-粘贴”增强方法([13, 12]),该方法通过随机将物体粘贴到图像中实现增强。以往关于复制-粘贴方法的研究依赖于对周围视觉上下文的建模以指导物体粘贴,但本文发现,仅采用随机粘贴这一简单机制已足够有效,并能在强基线模型基础上带来显著性能提升。此外,我们证明复制-粘贴增强与利用伪标签(如自训练)引入额外数据的半监督方法具有良好的可加性。在COCO实例分割任务上,我们取得了49.1的掩码AP和57.3的边界框AP,相较于此前的最先进方法分别提升了+0.6掩码AP和+1.5边界框AP。我们进一步验证了复制-粘贴方法在LVIS基准上的显著提升效果:我们的基线模型在稀有类别上的掩码AP超越了LVIS 2020挑战赛冠军方案,提升达+3.6。
代码仓库
open-mmlab/mmdetection
pytorch
RocketFlash/CAP_augmentation
GitHub 中提及
conradry/copy-paste-aug
pytorch
GitHub 中提及
PaddlePaddle/PaddleOCR
paddle
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| instance-segmentation-on-coco | Cascade Eff-B7 NAS-FPN (1280) | mask AP: 46.9 |
| instance-segmentation-on-coco | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | mask AP: 49.1 |
| instance-segmentation-on-coco-minival | Cascade Eff-B7 NAS-FPN (1280) | mask AP: 46.8 |
| instance-segmentation-on-coco-minival | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | mask AP: 48.9 |
| instance-segmentation-on-lvis-v1-0-val | Eff-B7 NAS-FPN (1280, Copy-Paste pre-training)) | mask AP: 38.1 |
| object-detection-on-coco | Cascade Eff-B7 NAS-FPN (1280) | box mAP: 54.8 |
| object-detection-on-coco | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | box mAP: 57.3 |
| object-detection-on-coco-minival | Cascade Eff-B7 NAS-FPN (1280) | box AP: 54.5 |
| object-detection-on-coco-minival | Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale) | box AP: 57.0 |
| object-detection-on-lvis-v1-0-val | Eff-B7 NAS-FPN (1280, Copy-Paste pre-training)) | box AP: 41.6 |
| object-detection-on-pascal-voc-2007 | Cascade Eff-B7 NAS-FPN (Copy Paste pre-training, single-scale) | MAP: 89.3% |
| semantic-segmentation-on-pascal-voc-2012-val | Eff-B7 NAS-FPN (Copy-Paste pre-training, single-scale)) | mIoU: 86.6% |