
摘要
近年来,研究界对全景图像表现出浓厚兴趣,因其能够提供360度全方位的视角信息。通过融合多种数据模态,可充分利用各模态的互补特性,基于语义分割实现更鲁棒、更丰富的场景理解,从而充分挖掘其潜力。然而,现有研究大多集中于针孔相机模型下的RGB-X语义分割任务。在本研究中,我们提出一种基于Transformer的跨模态融合架构,旨在弥合多模态融合与全向场景感知之间的差距。为应对等距柱状投影(equirectangular representation)带来的极端物体形变与全景畸变问题,我们引入了畸变感知模块。此外,在特征融合前,我们设计了跨模态交互机制,用于特征校正与信息交换,以实现双模态与三模态特征流之间的长距离上下文信息传递。在三个室内全景数据集上,我们对四种不同模态组合进行了全面测试,结果表明,本方法在mIoU指标上达到当前最优性能:在Stanford2D3DS(RGB-HHA)数据集上达到60.60%,在Structured3D(RGB-D-N)数据集上达到71.97%,在Matterport3D(RGB-D)数据集上达到35.92%。相关代码与训练好的模型即将开源。
代码仓库
sguttikon/SFSS-MMSI
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semantic-segmentation-on-matterport3d | SFSS-MMSI (RGB+Depth) | Test mIoU: 35.92 Validation mIoU: 39.19 |
| semantic-segmentation-on-matterport3d | SFSS-MMSI (RGB Only) | Test mIoU: 31.3 Validation mIoU: 35.15 |
| semantic-segmentation-on-matterport3d | SFSS-MMSI (RGB+Normal) | Test mIoU: 35.77 Validation mIoU: 38.91 |
| semantic-segmentation-on-matterport3d | SFSS-MMSI (RGB+Depth+Normal) | Test mIoU: 35.52 Validation mIoU: 39.26 |
| semantic-segmentation-on-stanford2d3d-1 | SFSS-MMSI (RGB Only) | mAcc: 63.96 mIoU: 52.87% |
| semantic-segmentation-on-stanford2d3d-1 | SFSS-MMSI (RGB+Normal) | mAcc: 68.79 mIoU: 58.24% |
| semantic-segmentation-on-stanford2d3d-1 | SFSS-MMSI (RGB+HHA) | mAcc: 70.68 mIoU: 60.6% |
| semantic-segmentation-on-stanford2d3d-1 | SFSS-MMSI (RGB+Depth+Normal) | mAcc: 69.03 mIoU: 59.43% |
| semantic-segmentation-on-stanford2d3d-1 | SFSS-MMSI (RGB+Depth) | mAcc: 68.57 mIoU: 55.49% |
| semantic-segmentation-on-structured3d | SFSS-MMSI (RGB Only) | Test mIoU: 68.34 Validation mIoU: 71.94 |
| semantic-segmentation-on-structured3d | SFSS-MMSI (RGB+Depth+Normal) | Test mIoU: 71.97 Validation mIoU: 75.86 |
| semantic-segmentation-on-structured3d | SFSS-MMSI (RGB+Depth) | Test mIoU: 70.17 Validation mIoU: 73.78 |
| semantic-segmentation-on-structured3d | SFSS-MMSI (RGB+Normal) | Test mIoU: 71 Validation mIoU: 74.38 |