
摘要
图像地理定位,即推断图像的地理位置,是一个具有许多潜在应用的挑战性计算机视觉问题。近期针对这一问题的最先进方法是一种深度图像分类方法,该方法将世界在空间上划分为若干单元格,并训练一个深度网络来预测给定图像所属的正确单元格。我们提出将这种方法与原始的Im2GPS方法相结合,后者通过将查询图像与地理标记图像数据库进行匹配,并从检索到的集合中推断出位置。我们通过对参考数据库中查询图像最近邻的位置应用核密度估计来估算其地理位置。有趣的是,我们发现对于我们的检索任务而言,最佳特征来自于使用分类损失训练的网络,尽管我们在测试时并未采用分类方法。使用分类损失进行训练的表现优于几种通常用于检索应用的深度特征学习方法(例如对比损失或三元组损失的Siamese网络)。我们的简单方法不仅达到了最先进的地理定位精度,而且所需训练数据显著减少。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| photo-geolocation-estimation-on-im2gps | Im2GPS ([L] KNN, sigma=4) | City level (25 km): 33.3 Continent level (2500 km): 71.3 Country level (750 km): 57.4 Reference images: 0 Region level (200 km): 44.3 Street level (1 km): 12.2 Training images: 6M |
| photo-geolocation-estimation-on-im2gps | Im2GPS (... 28m database) | City level (25 km): 33.3 Continent level (2500 km): 73.4 Country level (750 km): 61.6 Reference images: 28M Region level (200 km): 47.7 Street level (1 km): 14.4 Training images: 6M |
| photo-geolocation-estimation-on-im2gps | Im2GPS ([L] 7011C) | City level (25 km): 21.9 Continent level (2500 km): 63.7 Country level (750 km): 49.4 Reference images: 0 Region level (200 km): 34.6 Street level (1 km): 6.8 Training images: 6M |
| photo-geolocation-estimation-on-im2gps3k | Im2GPS (kNN, sigma = 4) | City level (25 km): 19.4 Continent level (2500 km): 55.9 Country level (750 km): 38.9 Region level (200 km): 26.9 Street level (1 km): 7.2 Training Images: 6M |
| photo-geolocation-estimation-on-im2gps3k | Im2GPS ([M] 7011C) | City level (25 km): 14.2 Continent level (2500 km): 52.7 Country level (750 km): 33.5 Region level (200 km): 21.3 Street level (1 km): 3.7 Training Images: 6M |
| photo-geolocation-estimation-on-im2gps3k | Im2GPS ([L] 7011C) | City level (25 km): 14.8 Continent level (2500 km): 52.4 Country level (750 km): 32.6 Region level (200 km): 21.4 Street level (1 km): 4.0 Training Images: 6M |
| photo-geolocation-estimation-on-yfcc4k | [L]kNN, σ = 4 | City (25 km): 5.7 Continent (2500 km): 42.0 Country (750 km): 23.5 Region (200 km): 11.0 Street (1 km): 2.3 |