HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multimodal Object Detection by Channel Switching and Spatial Attention

{Zheng Liu Erik Blasch Jozsef Hamari Junchi Bin Yue Cao}

Multimodal Object Detection by Channel Switching and Spatial Attention

Abstract

Multimodal object detection has attracted great attention in recent years since the information specific to different modalities can complement each other and effectively improve the accuracy and stability of the detection model. However, compared to processing the inputs from a single modality, fusing information from multiple modalities can significantly increase the computational complexity of the model, thus impairing its efficiency. Therefore the multi-modal fusion module needs to be carefully designed to enhance the performance of the detection model while keeping the computational consumption low. In this paper, we propose a novel lightweight fusion module that can efficiently fuse the inputs from different modalities using channel switching and spatial attention (CSSA). The effectiveness and generalizability of the module are tested using two public multimodal datasets LLVIP and FLIR, both of which comprise paired infrared (IR) and visible (RGB) images. The experiments demonstrate that the proposed CSSA module can substantially improve the accuracy of multimodal object detection without consuming excessive computing resources.

Benchmarks

BenchmarkMethodologyMetrics
multispectral-object-detection-on-flir-1ProbEn
mAP: 37.9%
mAP50: 75.5%
multispectral-object-detection-on-flir-1CSSA
mAP: 41.3%
mAP50: 79.2%
multispectral-object-detection-on-flir-1GAFF
mAP: 37.4%
mAP50: 74.6%
multispectral-object-detection-on-flir-1Halfway Fusion
mAP: 35.8%
pedestrian-detection-on-llvipCSSA
AP: 0.592
pedestrian-detection-on-llvipGAFF
AP: 0.558
pedestrian-detection-on-llvipHalfway Fusion
AP: 0.551
pedestrian-detection-on-llvipProbEn
AP: 0.515

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal Object Detection by Channel Switching and Spatial Attention | Papers | HyperAI