HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing

{Lu Yu Meixin Fang Shaohua Dong WuJie Zhou}

Abstract

Color–thermal (RGB-T) urban scene parsing has recently attracted widespread interest. However, most existing approaches to RGB-T urban scene parsing do not deeply explore the information complementarity between RGB-T features. In this study, we propose a cross-modal attention-cascaded fusion network (CACFNet) that fully exploits cross-modality. In our design, a cross-modal attention fusion module mines complementary information from two modalities. Subsequently, a cascaded fusion module decodes the multi-level features in an up-bottom manner. Noting that each pixel is labeled with the category of the region to which it belongs, we present a region-based module that explores the relationship between pixel and region. Moreover, in contrast to previous methods that employ only the cross-entropy loss to penalize pixel-wise predictions, we propose an additional loss to learn pixel–pixel relationships. Extensive experiments on two datasets demonstrate that the proposed CACFNet achieves state-of-the-art performance in RGB-T urban scene parsing

Benchmarks

BenchmarkMethodologyMetrics
thermal-image-segmentation-on-mfn-datasetCACFNet
mIOU: 57.8
thermal-image-segmentation-on-pst900CACFNet
mIoU: 86.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing | Papers | HyperAI