HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal
  Semantic Segmentation with Language Guidance

Abstract

The perception capability of robotic systems relies on the richness of thedataset. Although Segment Anything Model 2 (SAM2), trained on large datasets,demonstrates strong perception potential in perception tasks, its inherenttraining paradigm prevents it from being suitable for RGB-T tasks. To addressthese challenges, we propose SHIFNet, a novel SAM2-driven Hybrid InteractionParadigm that unlocks the potential of SAM2 with linguistic guidance forefficient RGB-Thermal perception. Our framework consists of two key components:(1) Semantic-Aware Cross-modal Fusion (SACF) module that dynamically balancesmodality contributions through text-guided affinity learning, overcoming SAM2'sinherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhancesglobal semantic information through a semantic enhancement module and thencombined with category embeddings to amplify cross-modal semantic consistency.With 32.27M trainable parameters, SHIFNet achieves state-of-the-artsegmentation performance on public benchmarks, reaching 89.8% on PST900 and67.8% on FMB, respectively. The framework facilitates the adaptation ofpre-trained large models to RGB-T segmentation tasks, effectively mitigatingthe high costs associated with data collection while endowing robotic systemswith comprehensive perception capabilities. The source code will be madepublicly available at https://github.com/iAsakiT3T/SHIFNet.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
semantic-segmentation-on-fmb-datasetSHIFNet (RGB-Infrared)
mIoU: 67.8
thermal-image-segmentation-on-mfn-datasetSHIFNet
mIOU: 59.2
thermal-image-segmentation-on-pst900SHIFNet
mIoU: 89.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance | Papers | HyperAI