
摘要
卷积神经网络通常将输入图像编码为一系列分辨率逐渐降低的中间特征。尽管这种结构适用于分类任务,但在需要同时进行识别与定位的任务(如目标检测)中表现不佳。为解决这一问题,编码器-解码器架构被提出,其通过在专为分类任务设计的主干网络(backbone)基础上引入解码器网络来实现。本文认为,由于主干网络的尺度逐渐减小,编码器-解码器架构在生成强健的多尺度特征方面存在局限性。为此,我们提出了SpineNet——一种具有尺度混洗(scale-permuted)中间特征和跨尺度连接的主干网络,该网络通过神经架构搜索(Neural Architecture Search)在目标检测任务上进行端到端学习。在采用相似构建模块的前提下,SpineNet模型在不同尺度下相比ResNet-FPN模型性能提升约3%的平均精度(AP),同时计算量(FLOPs)减少10%至20%。特别地,SpineNet-190在不使用测试时增强(test-time augmentation)的情况下,结合Mask R-CNN检测器在COCO数据集上达到52.5%的AP,结合RetinaNet检测器达到52.1%的AP,显著超越此前最优检测模型。此外,SpineNet还可迁移至分类任务,在具有挑战性的iNaturalist细粒度分类数据集上实现5%的Top-1准确率提升。相关代码已开源,地址为:https://github.com/tensorflow/tpu/tree/master/models/official/detection。
代码仓库
lucifer443/SpineNet-Pytorch
pytorch
GitHub 中提及
tensorflow/tpu
官方
tf
yan-roo/SpineNet-Pytorch
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-classification-on-imagenet | SpineNet-143 | GFLOPs: 9.1 Number of params: 60.5M Top 1 Accuracy: 79% |
| image-classification-on-inaturalist | SpineNet-143 | Top 1 Accuracy: 63.6% Top 5 Accuracy: 84.8% |
| instance-segmentation-on-coco | Mask R-CNN (SpineNet-190, 1536x1536) | mask AP: 46.1 |
| instance-segmentation-on-coco-minival | RetinaNet (SpineNet-190, 1536x1536) | mask AP: 46.1 |
| object-detection-on-coco | RetinaNet (SpineNet-96, 1024x1024) | AP50: 68.4 AP75: 52.5 APL: 62 APM: 52.3 APS: 32 box mAP: 48.6 |
| object-detection-on-coco | RetinaNet (SpineNet-49S, 640x640) | AP50: 60.5 AP75: 44.6 APL: 58 APM: 45 APS: 23.3 box mAP: 41.5 |
| object-detection-on-coco | RetinaNet (SpineNet-49, 896x896) | AP50: 66.3 AP75: 50.6 APL: 61.7 APM: 50.1 APS: 29.1 box mAP: 46.7 |
| object-detection-on-coco | RetinaNet (SpineNet-49, 640x640) | AP50: 63.8 AP75: 47.6 APL: 61.1 APM: 47.7 APS: 25.9 box mAP: 44.3 |
| object-detection-on-coco | SpineNet-49 (640, RetinaNet, single-scale) | AP50: 62.3 AP75: 46.1 APL: 57.3 APM: 45.2 APS: 23.7 box mAP: 42.8 |
| object-detection-on-coco | RetinaNet (SpineNet-143, 1280x1280) | AP50: 70.4 AP75: 54.9 APL: 62.1 APM: 53.9 APS: 33.6 box mAP: 50.7 |
| object-detection-on-coco | RetinaNet (SpineNet-190, 1280x1280) | AP50: 71.8 AP75: 56.5 APL: 63.6 APM: 55 APS: 35.4 box mAP: 52.1 |
| object-detection-on-coco-minival | RetinaNet (SpineNet-190, 1536x1536) | box AP: 52.2 |