Command Palette
Search for a command to run...
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Abstract
The perception capability of robotic systems relies on the richness of thedataset. Although Segment Anything Model 2 (SAM2), trained on large datasets,demonstrates strong perception potential in perception tasks, its inherenttraining paradigm prevents it from being suitable for RGB-T tasks. To addressthese challenges, we propose SHIFNet, a novel SAM2-driven Hybrid InteractionParadigm that unlocks the potential of SAM2 with linguistic guidance forefficient RGB-Thermal perception. Our framework consists of two key components:(1) Semantic-Aware Cross-modal Fusion (SACF) module that dynamically balancesmodality contributions through text-guided affinity learning, overcoming SAM2'sinherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhancesglobal semantic information through a semantic enhancement module and thencombined with category embeddings to amplify cross-modal semantic consistency.With 32.27M trainable parameters, SHIFNet achieves state-of-the-artsegmentation performance on public benchmarks, reaching 89.8% on PST900 and67.8% on FMB, respectively. The framework facilitates the adaptation ofpre-trained large models to RGB-T segmentation tasks, effectively mitigatingthe high costs associated with data collection while endowing robotic systemswith comprehensive perception capabilities. The source code will be madepublicly available at https://github.com/iAsakiT3T/SHIFNet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semantic-segmentation-on-fmb-dataset | SHIFNet (RGB-Infrared) | mIoU: 67.8 |
| thermal-image-segmentation-on-mfn-dataset | SHIFNet | mIOU: 59.2 |
| thermal-image-segmentation-on-pst900 | SHIFNet | mIoU: 89.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.