5 months ago

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin; Xuying Zhang; Zhongyu Li; Li Liu; Ming-Ming Cheng; Qibin Hou

Abstract

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that encode RGB-D information with RGB pretrained backbone, we pretrain the backbone using image-depth pairs from ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D representations; 2) DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design. DFormer avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pretrained backbones, which widely lies in existing methods but has not been resolved. We finetune the pretrained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D semantic segmentation datasets and five RGB-D salient object detection datasets. Our code is available at: https://github.com/VCIP-RGBD/DFormer.

Code Repositories

VCIP-RGBD/DFormer

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
rgb-d-salient-object-detection-on-des	DFormer-L	Average MAE: 0.013 S-Measure: 94.8 max E-Measure: 98.0 max F-Measure: 95.6
rgb-d-salient-object-detection-on-nju2k	DFormer-L	Average MAE: 0.023 S-Measure: 93.7 max E-Measure: 96.4 max F-Measure: 94.6
rgb-d-salient-object-detection-on-nlpr	DFormer-L	Average MAE: 0.016 S-Measure: 94.2 max E-Measure: 97.1 max F-Measure: 93.9
rgb-d-salient-object-detection-on-sip	DFormer-L	Average MAE: 0.032 S-Measure: 91.5 max E-Measure: 95.0 max F-Measure: 93.8
rgb-d-salient-object-detection-on-stere	DFormer-L	Average MAE: 0.030 S-Measure: 92.3 max E-Measure: 95.2 max F-Measure: 92.9
semantic-segmentation-on-nyu-depth-v2	DFormer-T	Mean IoU: 51.8%
semantic-segmentation-on-nyu-depth-v2	DFormer-L	Mean IoU: 57.2%
semantic-segmentation-on-nyu-depth-v2	DFormer-B	Mean IoU: 55.6%
semantic-segmentation-on-nyu-depth-v2	DFormer-S	Mean IoU: 53.6%
semantic-segmentation-on-sun-rgbd	DFormer-L	Mean IoU: 52.5%
semantic-segmentation-on-sun-rgbd	FSFNet	Mean IoU: 48.8%
semantic-segmentation-on-sun-rgbd	DFormer-B	Mean IoU: 51.2%
semantic-segmentation-on-sun-rgbd	TokenFusion (S)	Mean IoU: 50.0%
semantic-segmentation-on-syn-udtiri	DFormer	IoU: 90.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette