HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval

{Cong Bai Qing Ma Jiancheng Pan}

Abstract

Recently, remote sensing cross-modal retrieval has received incredible attention from researchers. However, the unique nature of remote-sensing images leads to many semantic confusion zones in the semantic space, which greatly affects retrieval performance. We propose a novel scene-aware aggregation network (SWAN) to reduce semantic confusion by improving scene perception capability. In visual representation, a visual multiscale fusion module (VMSF) is presented to fuse visual features with different scales as a visual representation backbone. Meanwhile, a scene fine-grained sensing module (SFGS) is proposed to establish the associations of salient features at different granularity. A scene-aware visual aggregation representation is formed by the visual information generated by these two modules. In textual representation, a textual coarse-grained enhancement module (TCGE) is designed to enhance the semantics of text and to align visual information. Furthermore, as the diversity and differentiation of remote sensing scenes weaken the understanding of scenes, a new metric, namely, scene recall is proposed to measure the perception of scenes by evaluating scene-level retrieval performance, which can also verify the effectiveness of our approach in reducing semantic confusion. By performance comparisons, ablation studies and visualization analysis, we validated the effectiveness and superiority of our approach on two datasets, RSICD and RSITMD. The source code is available at https://github.com/kinshingpoon/SWAN-pytorch.

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-on-rsicdSWAN
Image-to-text R@1: 7.41%
Mean Recall: 20.61%
text-to-image R@1: 5.56%
cross-modal-retrieval-on-rsitmdSWAN
Image-to-text R@1: 13.35%
Mean Recall: 34.11%
text-to-imageR@1: 11.24%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval | Papers | HyperAI