4 个月前

CMX:基于Transformer的RGB-X语义分割跨模态融合方法

CMX:基于Transformer的RGB-X语义分割跨模态融合方法

摘要

基于图像分割的场景理解是自动驾驶车辆的关键组成部分。通过利用辅助模态(X-模态)中的互补特征,可以进一步提升RGB图像的像素级语义分割性能。然而,由于不同模态传感器特性的差异,使用一种模态无关的模型来覆盖多种传感器的问题仍未得到解决。与以往针对特定模态的方法不同,本研究提出了一种统一的融合框架——CMX,用于RGB-X语义分割。为了在不同的模态之间实现良好的泛化能力,尤其是在包含补充信息和不确定性的情况下,统一的跨模态交互对于模态融合至关重要。具体而言,我们设计了一个跨模态特征校正模块(Cross-Modal Feature Rectification Module, CM-FRM),该模块通过利用一个模态的特征来校正另一个模态的特征,从而对双模态特征进行校准。在校正后的特征对基础上,我们部署了一个特征融合模块(Feature Fusion Module, FFM),在混合之前充分交换长距离上下文信息。为了验证CMX的有效性,我们首次将五种与RGB互补的模态——深度、热成像、偏振、事件和LiDAR——统一起来。大量实验表明,CMX在多模态融合中表现出色,在五个RGB-深度基准测试集上达到了最先进的性能,并且在RGB-热成像、RGB-偏振和RGB-LiDAR数据集上也取得了最佳表现。此外,为了探讨其在密集稀疏数据融合中的泛化能力,我们基于EventScape数据集建立了一个RGB-事件语义分割基准测试集,在该测试集上CMX也创下了新的最先进记录。CMX的源代码已公开发布于https://github.com/huaaaliu/RGBX_Semantic_Segmentation。

代码仓库

huaaaliu/rgbx_semantic_segmentation
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
camouflaged-object-segmentation-on-pcod-1200CMX
S-Measure: 0.922
image-manipulation-localization-on-casia-v1CMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .761
image-manipulation-localization-on-casia-v1CMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .774
image-manipulation-localization-on-casia-v1CMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .791
image-manipulation-localization-on-cocoglideCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .566
image-manipulation-localization-on-cocoglideCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .585
image-manipulation-localization-on-cocoglideCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .516
image-manipulation-localization-on-columbiaCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .872
image-manipulation-localization-on-columbiaCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .834
image-manipulation-localization-on-columbiaCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .884
image-manipulation-localization-on-coverageCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .577
image-manipulation-localization-on-coverageCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .630
image-manipulation-localization-on-coverageCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .592
image-manipulation-localization-on-dso-1CMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .776
image-manipulation-localization-on-dso-1CMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .792
image-manipulation-localization-on-dso-1CMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .895
multispectral-object-detection-on-flir-1CMX
mAP50: 82.2%
object-detection-on-dsecCMX
mAP: 29.1
object-detection-on-eventpedCMX
AP: 58.0
object-detection-on-inoutdoorCMX
AP: 62.3
object-detection-on-pku-ddd17-carCMX
mAP50: 80.4
object-detection-on-stcrowdCMX
AP: 61.0
pedestrian-detection-on-cvc14CMX
AP50: 68.9
pedestrian-detection-on-dvtodCMX
mAP: 81.6
pedestrian-detection-on-llvipCMX
AP: 0.596
semantic-segmentation-on-bjroadCMX
IoU: 62.28
semantic-segmentation-on-cityscapes-valCMX (B4)
mIoU: 82.6
semantic-segmentation-on-cityscapes-valCMX (B2)
mIoU: 81.6
semantic-segmentation-on-ddd17CMX
mIoU: 71.88
semantic-segmentation-on-deliverCMX (RGB-Depth)
mIoU: 62.67
semantic-segmentation-on-deliverCMX (RGB-LiDAR)
mIoU: 56.37
semantic-segmentation-on-deliverCMX (RGB-Event)
mIoU: 56.52
semantic-segmentation-on-dsecCMX
mIoU: 72.42
semantic-segmentation-on-event-basedCMX
mIoU: 85.81
semantic-segmentation-on-eventscapeCMX (B2)
mIoU: 61.90
semantic-segmentation-on-eventscapeCMX (B4)
mIoU: 64.28
semantic-segmentation-on-gamusCMX
mIoU: 75.23
semantic-segmentation-on-kitti-360CMX (RGB-LiDAR)
mIoU: 64.31
semantic-segmentation-on-kitti-360CMX (RGB-Depth)
mIoU: 64.43
semantic-segmentation-on-llrgbd-syntheticCMX (SegFormer-B2)
mIoU: 66.52
semantic-segmentation-on-nyu-depth-v2CMX (B2)
Mean IoU: 54.4%
semantic-segmentation-on-nyu-depth-v2CMX (B5)
Mean IoU: 56.9%
semantic-segmentation-on-nyu-depth-v2CMX (B4)
Mean IoU: 56.3%
semantic-segmentation-on-portoCMX
IoU: 72.85
semantic-segmentation-on-potsdamCMX
mIoU: 85.97
semantic-segmentation-on-replicaCMX
mIoU: 17.0
semantic-segmentation-on-scannetv2CMX
Mean IoU: 61.3%
semantic-segmentation-on-selmaCMX
mIoU: 91.7
semantic-segmentation-on-spectralwasteCMX ( RGB-HYPER3 )
mIoU: 56.6
semantic-segmentation-on-spectralwasteCMX (RGB-HYPER)
mIoU: 58.2
semantic-segmentation-on-stanford2d3d-rgbdCMX (SegFormer-B2)
Pixel Accuracy: 82.3
mIoU: 61.2
semantic-segmentation-on-stanford2d3d-rgbdCMX (SegFormer-B4)
Pixel Accuracy: 82.6
mIoU: 62.1
semantic-segmentation-on-sun-rgbdCMX (B4)
Mean IoU: 52.1%
semantic-segmentation-on-sun-rgbdCMX (B5)
Mean IoU: 52.4%
semantic-segmentation-on-sun-rgbdDPLNet
Mean IoU: 49.7%
semantic-segmentation-on-syn-udtiriCMX
IoU: 93.31
semantic-segmentation-on-synthetic-bathingCMX-SRA
mIoU: 94.20
semantic-segmentation-on-synthetic-bathingCMX
mIoU: 88.23
semantic-segmentation-on-tlcgisCMX
IoU: 84.14
semantic-segmentation-on-uplightCMX (B2 RGB-DoLP)
mIoU: 92.07
semantic-segmentation-on-uplightCMX (B2 RGB-AoLP)
mIoU: 92.13
semantic-segmentation-on-us3dCMX
mIoU: 84.63
semantic-segmentation-on-vaihingenCMX
mIoU: 82.87
semantic-segmentation-on-zju-rgb-pCMX (B4 RGB-AoLP)
mIoU: 92.6
semantic-segmentation-on-zju-rgb-pCMX (B2 RGB-DoLP)
mIoU: 92.2
thermal-image-segmentation-on-kp-day-nightCMX
mIoU: 46.2
thermal-image-segmentation-on-mfn-datasetCMX (B4)
mIOU: 59.7
thermal-image-segmentation-on-mfn-datasetCMX (B2)
mIOU: 58.2
thermal-image-segmentation-on-noisy-rs-rgb-tCMX (B4)
mIoU: 56.1
thermal-image-segmentation-on-rgb-t-glassCMX
MAE: 0.029

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
CMX:基于Transformer的RGB-X语义分割跨模态融合方法 | 论文 | HyperAI超神经