
摘要
BiSeNet 已被证明是一种流行的用于实时分割的双流网络。然而,其通过增加额外路径来编码空间信息的原则较为耗时,且从预训练任务(如图像分类)借用的骨干网络可能由于缺乏针对特定任务的设计而对图像分割效率不高。为了解决这些问题,我们提出了一种新颖且高效的结构——短期密集连接网络(Short-Term Dense Concatenate network, STDC 网络),通过去除结构冗余实现这一目标。具体而言,我们逐步降低特征图的维度,并将其聚合用于图像表示,这构成了 STDC 网络的基本模块。在解码器中,我们提出了一种细节聚合模块,通过在单流模式下将空间信息的学习整合到低层中。最终,低层特征和深层特征融合以预测最终的分割结果。我们在 Cityscapes 和 CamVid 数据集上进行了大量实验,结果表明我们的方法在分割精度和推理速度之间取得了令人满意的平衡。在 Cityscapes 数据集上,我们在 NVIDIA GTX 1080Ti 上实现了 71.9% 的测试集 mIoU 和 250.4 FPS 的速度,比最新方法快 45.2%,同时在高分辨率图像上推理时达到了 76.8% 的 mIoU 和 97.0 FPS 的速度。
代码仓库
PaddlePaddle/PaddleSeg
paddle
MichaelFan01/STDC-Seg
官方
pytorch
GitHub 中提及
zh320/realtime-semantic-segmentation-pytorch
pytorch
GitHub 中提及
Deci-AI/super-gradients
pytorch
GitHub 中提及
pideyi1025/DeepLabV3Plus-RailSem19
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| dichotomous-image-segmentation-on-dis-te1 | STDC | E-measure: 0.798 HCE: 249 MAE: 0.090 S-Measure: 0.723 max F-Measure: 0.648 weighted F-measure: 0.562 |
| dichotomous-image-segmentation-on-dis-te2 | STDC | E-measure: 0.834 HCE: 556 MAE: 0.092 S-Measure: 0.759 max F-Measure: 0.720 weighted F-measure: 0.636 |
| dichotomous-image-segmentation-on-dis-te3 | STDC | E-measure: 0.855 HCE: 1081 MAE: 0.090 S-Measure: 0.771 max F-Measure: 0.745 weighted F-measure: 0.662 |
| dichotomous-image-segmentation-on-dis-te4 | STDC | E-measure: 0.841 HCE: 3819 MAE: 0.102 S-Measure: 0.762 max F-Measure: 0.731 weighted F-measure: 0.652 |
| dichotomous-image-segmentation-on-dis-vd | STDC | E-measure: 0.817 HCE: 1598 MAE: 0.103 S-Measure: 0.740 max F-Measure: 0.696 weighted F-measure: 0.613 |
| real-time-semantic-segmentation-on-cityscapes | STDC2-75 | Frame (fps): 97.0(1080Ti) mIoU: 76.8% |
| real-time-semantic-segmentation-on-cityscapes | STDC2-50 | Frame (fps): 188.6 mIoU: 73.4% |
| real-time-semantic-segmentation-on-cityscapes | STDC1-50 | Frame (fps): 250.4(1080Ti) mIoU: 71.9% |
| real-time-semantic-segmentation-on-cityscapes | STDC1-75 | Frame (fps): 126.7 mIoU: 75.3% |
| real-time-semantic-segmentation-on-cityscapes-1 | STDC1-Seg75 | Frame (fps): 126.7 mIoU: 74.5% |
| real-time-semantic-segmentation-on-cityscapes-1 | STDC2-Seg75 | Frame (fps): 97 mIoU: 77% |
| semantic-segmentation-on-bdd100k-val | STDC1 | mIoU: 52.1(45.8FPS) |
| semantic-segmentation-on-bdd100k-val | STDC2 | mIoU: 53.8(33.0FPS) |