HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Shraman Pramanick; Ewa M. Nowara; Joshua Gleason; Carlos D. Castillo; Rama Chellappa

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Abstract

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

Benchmarks

BenchmarkMethodologyMetrics
photo-geolocation-estimation-on-gws15kTranslocator
City level (25 km): 1.1
Continent level (2500 km): 48.3
Country level (750 km): 25.5
Region level (200 km): 8.0
Street level (1 km): 0.5
photo-geolocation-estimation-on-im2gps3kTranslocator
City level (25 km): 31.1
Continent level (2500 km): 80.1
Country level (750 km): 58.9
Region level (200 km): 46.7
Street level (1 km): 11.8
Training Images: 4.7M
photo-geolocation-estimation-on-yfcc26kTranslocator
City level (25 km): 17.8
Continent level (2500 km): 60.6
Country level (750 km): 41.3
Region level (200 km): 28.0
Street level (1 km): 7.2
Training Images: 4.7M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Where in the World is this Image? Transformer-based Geo-localization in the Wild | Papers | HyperAI