
摘要
语义分割需要丰富的空间信息和较大的感受野。然而,现代方法通常为了实现实时推理速度而牺牲空间分辨率,这导致了性能较差。在本文中,我们通过提出一种新颖的双边分割网络(Bilateral Segmentation Network, BiSeNet)来解决这一困境。首先,我们设计了一个小步幅的空间路径(Spatial Path),以保留空间信息并生成高分辨率特征。同时,采用快速下采样策略的上下文路径(Context Path)被用来获得足够的感受野。在此基础上,我们引入了一种新的特征融合模块(Feature Fusion Module),以高效地结合特征。所提出的架构在Cityscapes、CamVid和COCO-Stuff数据集上实现了速度与分割性能之间的良好平衡。具体而言,对于2048x1024的输入图像,我们在单个NVIDIA Titan XP显卡上达到了每秒105帧的速度,并在Cityscapes测试数据集上获得了68.4%的平均交并比(Mean IOU),显著快于具有类似性能的现有方法。
代码仓库
PaddlePaddle/PaddleSeg
paddle
yakhyo/face-parsing
pytorch
GitHub 中提及
CodePlay2016/BiSENet-TF
tf
GitHub 中提及
ycszen/TorchSeg
pytorch
GitHub 中提及
SharifElfouly/easy-model-zoo
pytorch
GitHub 中提及
kritiksoman/GIMP-ML
pytorch
GitHub 中提及
akinoriosamura/TorchSeg-mirror
pytorch
GitHub 中提及
hm7455/Anti-collision_Semantic-segmentation_
tf
GitHub 中提及
zh320/realtime-semantic-segmentation-pytorch
pytorch
GitHub 中提及
Blaizzy/BiSeNet-Implementation
tf
GitHub 中提及
ooooverflow/BiSeNet
pytorch
GitHub 中提及
osmr/imgclsmob
mxnet
GitHub 中提及
renhaa/semantic-diffusion
pytorch
GitHub 中提及
Shuai-Xie/BiSeNet-CCP
pytorch
GitHub 中提及
pdoublerainbow/bisenet-tensorflow
tf
GitHub 中提及
GuangyanZhang/SCNN-Deeplabv3-bisenet-icnet
paddle
GitHub 中提及
AmrElsersy/PointPainting
pytorch
GitHub 中提及
CoinCheung/BiSeNet
pytorch
GitHub 中提及
justld/BisNetV1_paddle
paddle
kirilcvetkov92/Semantic-Segmentation
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| dichotomous-image-segmentation-on-dis-te1 | BSV1 | E-measure: 0.741 HCE: 288 MAE: 0.108 S-Measure: 0.695 max F-Measure: 0.595 weighted F-measure: 0.474 |
| dichotomous-image-segmentation-on-dis-te2 | BSV1 | E-measure: 0.781 HCE: 621 MAE: 0.111 S-Measure: 0.740 max F-Measure: 0.680 weighted F-measure: 0.564 |
| dichotomous-image-segmentation-on-dis-te3 | BSV1 | E-measure: 0.801 HCE: 1146 MAE: 0.109 S-Measure: 0.757 max F-Measure: 0.710 weighted F-measure: 0.595 |
| dichotomous-image-segmentation-on-dis-te4 | BSV1 | E-measure: 0.788 HCE: 3999 MAE: 0.114 S-Measure: 0.755 max F-Measure: 0.710 weighted F-measure: 0.598 |
| dichotomous-image-segmentation-on-dis-vd | BSV1 | E-measure: 0.767 HCE: 1660 MAE: 0.116 S-Measure: 0.728 max F-Measure: 0.662 weighted F-measure: 0.548 |
| real-time-semantic-segmentation-on-camvid | BiSeNet | mIoU: 68.7% |
| real-time-semantic-segmentation-on-cityscapes | BiSeNet(ResNet-18) | Frame (fps): 65.5 Time (ms): 15.2 mIoU: 74.7% |
| real-time-semantic-segmentation-on-cityscapes | BiSeNet(Xception39) | Frame (fps): 105.8 Time (ms): 9.5 mIoU: 68.4% |
| real-time-semantic-segmentation-on-cityscapes | BiSeNet | Frame (fps): 65.5 mIoU: 74.7% |
| semantic-segmentation-on-bdd100k-val | BiSeNet-V1(ResNet-18) | mIoU: 53.8(45.1fps) |
| semantic-segmentation-on-camvid | BiSeNet | Mean IoU: 68.7% |
| semantic-segmentation-on-cityscapes | BiSeNet (ResNet-101) | Mean IoU (class): 78.9% |
| semantic-segmentation-on-skyscapes-dense-1 | BiSeNet (ResNet-50) | Mean IoU: 30.82 |
| semantic-segmentation-on-trans10k | BiSeNet | GFLOPs: 19.91 mIoU: 58.40% |