HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

Zhang Jiaming ; Liu Huayao ; Yang Kailun ; Hu Xinxin ; Liu Ruiping ; Stiefelhagen Rainer

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with
  Transformers

Abstract

Scene understanding based on image segmentation is a crucial component ofautonomous vehicles. Pixel-wise semantic segmentation of RGB images can beadvanced by exploiting complementary features from the supplementary modality(X-modality). However, covering a wide variety of sensors with amodality-agnostic model remains an unresolved problem due to variations insensor characteristics among different modalities. Unlike previousmodality-specific methods, in this work, we propose a unified fusion framework,CMX, for RGB-X semantic segmentation. To generalize well across differentmodalities, that often include supplements as well as uncertainties, a unifiedcross-modal interaction is crucial for modality fusion. Specifically, we designa Cross-Modal Feature Rectification Module (CM-FRM) to calibrate bi-modalfeatures by leveraging the features from one modality to rectify the featuresof the other modality. With rectified feature pairs, we deploy a Feature FusionModule (FFM) to perform sufficient exchange of long-range contexts beforemixing. To verify CMX, for the first time, we unify five modalitiescomplementary to RGB, i.e., depth, thermal, polarization, event, and LiDAR.Extensive experiments show that CMX generalizes well to diverse multi-modalfusion, achieving state-of-the-art performances on five RGB-Depth benchmarks,as well as RGB-Thermal, RGB-Polarization, and RGB-LiDAR datasets. Besides, toinvestigate the generalizability to dense-sparse data fusion, we establish anRGB-Event semantic segmentation benchmark based on the EventScape dataset, onwhich CMX sets the new state-of-the-art. The source code of CMX is publiclyavailable at https://github.com/huaaaliu/RGBX_Semantic_Segmentation.

Code Repositories

huaaaliu/rgbx_semantic_segmentation
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
camouflaged-object-segmentation-on-pcod-1200CMX
S-Measure: 0.922
image-manipulation-localization-on-casia-v1CMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .761
image-manipulation-localization-on-casia-v1CMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .774
image-manipulation-localization-on-casia-v1CMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .791
image-manipulation-localization-on-cocoglideCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .566
image-manipulation-localization-on-cocoglideCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .585
image-manipulation-localization-on-cocoglideCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .516
image-manipulation-localization-on-columbiaCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .872
image-manipulation-localization-on-columbiaCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .834
image-manipulation-localization-on-columbiaCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .884
image-manipulation-localization-on-coverageCMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .577
image-manipulation-localization-on-coverageCMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .630
image-manipulation-localization-on-coverageCMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .592
image-manipulation-localization-on-dso-1CMX (RGB+Bayar)
Average Pixel F1(Fixed threshold): .776
image-manipulation-localization-on-dso-1CMX (RGB+SRM)
Average Pixel F1(Fixed threshold): .792
image-manipulation-localization-on-dso-1CMX (RGB+NP++)
Average Pixel F1(Fixed threshold): .895
multispectral-object-detection-on-flir-1CMX
mAP50: 82.2%
object-detection-on-dsecCMX
mAP: 29.1
object-detection-on-eventpedCMX
AP: 58.0
object-detection-on-inoutdoorCMX
AP: 62.3
object-detection-on-pku-ddd17-carCMX
mAP50: 80.4
object-detection-on-stcrowdCMX
AP: 61.0
pedestrian-detection-on-cvc14CMX
AP50: 68.9
pedestrian-detection-on-dvtodCMX
mAP: 81.6
pedestrian-detection-on-llvipCMX
AP: 0.596
semantic-segmentation-on-bjroadCMX
IoU: 62.28
semantic-segmentation-on-cityscapes-valCMX (B4)
mIoU: 82.6
semantic-segmentation-on-cityscapes-valCMX (B2)
mIoU: 81.6
semantic-segmentation-on-ddd17CMX
mIoU: 71.88
semantic-segmentation-on-deliverCMX (RGB-Depth)
mIoU: 62.67
semantic-segmentation-on-deliverCMX (RGB-LiDAR)
mIoU: 56.37
semantic-segmentation-on-deliverCMX (RGB-Event)
mIoU: 56.52
semantic-segmentation-on-dsecCMX
mIoU: 72.42
semantic-segmentation-on-event-basedCMX
mIoU: 85.81
semantic-segmentation-on-eventscapeCMX (B2)
mIoU: 61.90
semantic-segmentation-on-eventscapeCMX (B4)
mIoU: 64.28
semantic-segmentation-on-gamusCMX
mIoU: 75.23
semantic-segmentation-on-kitti-360CMX (RGB-LiDAR)
mIoU: 64.31
semantic-segmentation-on-kitti-360CMX (RGB-Depth)
mIoU: 64.43
semantic-segmentation-on-llrgbd-syntheticCMX (SegFormer-B2)
mIoU: 66.52
semantic-segmentation-on-nyu-depth-v2CMX (B2)
Mean IoU: 54.4%
semantic-segmentation-on-nyu-depth-v2CMX (B5)
Mean IoU: 56.9%
semantic-segmentation-on-nyu-depth-v2CMX (B4)
Mean IoU: 56.3%
semantic-segmentation-on-portoCMX
IoU: 72.85
semantic-segmentation-on-potsdamCMX
mIoU: 85.97
semantic-segmentation-on-replicaCMX
mIoU: 17.0
semantic-segmentation-on-scannetv2CMX
Mean IoU: 61.3%
semantic-segmentation-on-selmaCMX
mIoU: 91.7
semantic-segmentation-on-spectralwasteCMX ( RGB-HYPER3 )
mIoU: 56.6
semantic-segmentation-on-spectralwasteCMX (RGB-HYPER)
mIoU: 58.2
semantic-segmentation-on-stanford2d3d-rgbdCMX (SegFormer-B2)
Pixel Accuracy: 82.3
mIoU: 61.2
semantic-segmentation-on-stanford2d3d-rgbdCMX (SegFormer-B4)
Pixel Accuracy: 82.6
mIoU: 62.1
semantic-segmentation-on-sun-rgbdCMX (B4)
Mean IoU: 52.1%
semantic-segmentation-on-sun-rgbdCMX (B5)
Mean IoU: 52.4%
semantic-segmentation-on-sun-rgbdDPLNet
Mean IoU: 49.7%
semantic-segmentation-on-syn-udtiriCMX
IoU: 93.31
semantic-segmentation-on-synthetic-bathingCMX-SRA
mIoU: 94.20
semantic-segmentation-on-synthetic-bathingCMX
mIoU: 88.23
semantic-segmentation-on-tlcgisCMX
IoU: 84.14
semantic-segmentation-on-uplightCMX (B2 RGB-DoLP)
mIoU: 92.07
semantic-segmentation-on-uplightCMX (B2 RGB-AoLP)
mIoU: 92.13
semantic-segmentation-on-us3dCMX
mIoU: 84.63
semantic-segmentation-on-vaihingenCMX
mIoU: 82.87
semantic-segmentation-on-zju-rgb-pCMX (B4 RGB-AoLP)
mIoU: 92.6
semantic-segmentation-on-zju-rgb-pCMX (B2 RGB-DoLP)
mIoU: 92.2
thermal-image-segmentation-on-kp-day-nightCMX
mIoU: 46.2
thermal-image-segmentation-on-mfn-datasetCMX (B4)
mIOU: 59.7
thermal-image-segmentation-on-mfn-datasetCMX (B2)
mIOU: 58.2
thermal-image-segmentation-on-noisy-rs-rgb-tCMX (B4)
mIoU: 56.1
thermal-image-segmentation-on-rgb-t-glassCMX
MAE: 0.029

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp