Juncai PengYi LiuShiyu TangYuying HaoLutao ChuGuowei ChenZewu WuZeyu ChenZhiliang YuYuning DuQingqing DangBaohua LaiQiwen LiuXiaoguang HuDianhai YuYanjun Ma

摘要
现实世界的应用对语义分割方法提出了较高要求。尽管深度学习推动了语义分割技术的显著进步,但现有实时语义分割方法的性能仍不尽如人意。本文提出PP-LiteSeg,一种面向实时语义分割任务的新型轻量化模型。具体而言,我们设计了一种灵活轻量的解码器(Flexible and Lightweight Decoder, FLD),有效降低了传统解码器的计算开销。为增强特征表示能力,我们提出统一注意力融合模块(Unified Attention Fusion Module, UAFM),该模块结合空间注意力与通道注意力生成权重,并利用该权重对输入特征进行融合。此外,我们还引入一种简单金字塔池化模块(Simple Pyramid Pooling Module, SPPM),以较低的计算成本聚合全局上下文信息。大量实验结果表明,PP-LiteSeg在准确率与推理速度之间实现了优越的平衡。在Cityscapes测试集上,PP-LiteSeg在NVIDIA GTX 1080Ti显卡上分别达到72.0% mIoU/273.6 FPS和77.5% mIoU/102.6 FPS的性能。源代码与预训练模型已开源,可在PaddleSeg项目中获取:https://github.com/PaddlePaddle/PaddleSeg。
代码仓库
PaddlePaddle/PaddleSeg
官方
paddle
GitHub 中提及
zh320/realtime-semantic-segmentation-pytorch
pytorch
GitHub 中提及
Deci-AI/super-gradients
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| real-time-semantic-segmentation-on-camvid | PP-LiteSeg-B | Frame (fps): 154.8 mIoU: 75 |
| real-time-semantic-segmentation-on-camvid | PP-LiteSeg-T | Frame (fps): 222.3 mIoU: 73.3 |
| real-time-semantic-segmentation-on-cityscapes | PP-LiteSeg-B1 | Frame (fps): 195.3(1080Ti) mIoU: 73.9% |
| real-time-semantic-segmentation-on-cityscapes | PP-LiteSeg-B2 | Frame (fps): 102.6(1080Ti) mIoU: 77.5% |
| real-time-semantic-segmentation-on-cityscapes | PP-LiteSeg-T1 | Frame (fps): 273.6(1080Ti) mIoU: 72.0% |
| real-time-semantic-segmentation-on-cityscapes | PP-LiteSeg-T2 | Frame (fps): 143.6(1080Ti) mIoU: 74.9% |
| real-time-semantic-segmentation-on-cityscapes-1 | PP-LiteSeg-T1 | mIoU: 73.1 |
| real-time-semantic-segmentation-on-cityscapes-1 | PP-LiteSeg-B2 | mIoU: 78.2 |
| real-time-semantic-segmentation-on-cityscapes-1 | PP-LiteSeg-T2 | mIoU: 76 |
| real-time-semantic-segmentation-on-cityscapes-1 | PP-LiteSeg-B1 | mIoU: 75.3 |