HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Attention-Based Multimodal Image Matching

Moreshet Aviad ; Keller Yosi

Attention-Based Multimodal Image Matching

Abstract

We propose an attention-based approach for multimodal image patch matchingusing a Transformer encoder attending to the feature maps of a multiscaleSiamese CNN. Our encoder is shown to efficiently aggregate multiscale imageembeddings while emphasizing task-specific appearance-invariant image cues. Wealso introduce an attention-residual architecture, using a residual connectionbypassing the encoder. This additional learning signal facilitates end-to-endtraining from scratch. Our approach is experimentally shown to achieve newstate-of-the-art accuracy on both multimodal and single modality benchmarks,illustrating its general applicability. To the best of our knowledge, this isthe first successful implementation of the Transformer encoder architecture tothe multimodal image patch matching task.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
multimodal-patch-matching-on-visnirMultiscale Transformer Encoder
FPR95: 1.44
patch-matching-on-brown-datasetMultiscale Transformer Encoder
FPR95: 0.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Attention-Based Multimodal Image Matching | Papers | HyperAI