HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

GIM: Learning Generalizable Image Matcher From Internet Videos

Shen Xuelun ; Cai Zhipeng ; Yin Wei ; Müller Matthias ; Li Zijun ; Wang Kaixuan ; Chen Xiaozhi ; Wang Cheng

GIM: Learning Generalizable Image Matcher From Internet Videos

Abstract

Image matching is a fundamental computer vision problem. While learning-basedmethods achieve state-of-the-art performance on existing benchmarks, theygeneralize poorly to in-the-wild images. Such methods typically need to trainseparate models for different scene types and are impractical when the scenetype is unknown in advance. One of the underlying problems is the limitedscalability of existing data construction pipelines, which limits the diversityof standard image matching datasets. To address this problem, we propose GIM, aself-training framework for learning a single generalizable model based on anyimage matching architecture using internet videos, an abundant and diverse datasource. Given an architecture, GIM first trains it on standard domain-specificdatasets and then combines it with complementary matching methods to createdense labels on nearby frames of novel videos. These labels are filtered byrobust fitting, and then enhanced by propagating them to distant frames. Thefinal model is trained on propagated data with strong augmentations. We alsopropose ZEB, the first zero-shot evaluation benchmark for image matching. Bymixing data from diverse domains, ZEB can thoroughly assess the cross-domaingeneralization performance of different methods. Applying GIM consistentlyimproves the zero-shot performance of 3 state-of-the-art image matchingarchitectures; with 50 hours of YouTube videos, the relative zero-shotperformance improves by 8.4%-18.1%. GIM also enables generalization to extremecross-domain data such as Bird Eye View (BEV) images of projected 3D pointclouds (Fig. 1(c)). More importantly, our single zero-shot model consistentlyoutperforms domain-specific baselines when evaluated on downstream tasksinherent to their respective domains. The video presentation is available athttps://www.youtube.com/watch?v=FU_MJLD8LeY.

Code Repositories

xuelunshen/gim
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-matching-on-zebGIM-RoMa
Mean AUC@5°: 53.3
image-matching-on-zebGIM-DKM
Mean AUC@5°: 51.2
image-matching-on-zebGIM-LightGlue
Mean AUC@5°: 38.3
image-matching-on-zebGIM-LoFTR
Mean AUC@5°: 39.1
pose-estimation-on-inlocGIM-LoFTR
DUC1-Acc@0.25m,10°: 54.5
DUC1-Acc@0.5m,10°: 78.3
DUC1-Acc@1.0m,10°: 87.4
DUC2-Acc@0.25m,10°: 63.4
DUC2-Acc@0.5m,10°: 83.2
DUC2-Acc@1.0m,10°: 87.0
pose-estimation-on-inlocGIM-DKM
DUC1-Acc@0.25m,10°: 57.1
DUC1-Acc@0.5m,10°: 78.8
DUC1-Acc@1.0m,10°: 88.4
DUC2-Acc@0.25m,10°: 70.2
DUC2-Acc@0.5m,10°: 91.6
DUC2-Acc@1.0m,10°: 92.4
pose-estimation-on-inlocGIM-SuperGlue
DUC1-Acc@0.25m,10°: 53.5
DUC1-Acc@0.5m,10°: 76.8
DUC1-Acc@1.0m,10°: 86.9
DUC2-Acc@0.25m,10°: 61.8
DUC2-Acc@0.5m,10°: 85.5
DUC2-Acc@1.0m,10°: 87.8
visual-localization-on-aachen-day-night-v1-1GIM-DKM
Acc@0.25m, 2°: 77.0
Acc@0.5m, 5°: 90.1
Acc@5m, 10°: 99.5
visual-localization-on-aachen-day-night-v1-1GIM-SuperGlue
Acc@0.25m, 2°: 78.0
Acc@0.5m, 5°: 90.6
Acc@5m, 10°: 100.0
visual-localization-on-aachen-day-night-v1-1GIM-LoFTR
Acc@0.25m, 2°: 79.1
Acc@0.5m, 5°: 91.6
Acc@5m, 10°: 100.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GIM: Learning Generalizable Image Matcher From Internet Videos | Papers | HyperAI