
摘要
尽管成功估算照片的地理定位能够支持多种有趣的应用,但这一任务本身极具挑战性。由于问题本身的复杂性,现有大多数方法仅适用于特定区域、特定影像类型或全球范围内的标志性地点。仅有少数研究能够实现无限制的GPS坐标预测。本文提出若干基于深度学习的方法,采用后一类思路,将地理定位问题建模为分类问题,即将地球表面划分为多个地理单元(地理格网)。我们进一步引入多层次的划分结构所蕴含的层次化知识,并结合照片场景内容特征(如室内、自然或城市环境等)进行综合建模。由此,卷积神经网络在学习过程中能够融合不同空间分辨率下的上下文信息,以及针对不同环境场景的更精细化特征。在两个基准数据集上的实验结果表明,所提方法在性能上超越现有最先进水平,同时显著减少了所需训练图像的数量,且不依赖于需要特定参考数据集的检索类方法。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| photo-geolocation-estimation-on-gws15k | ISNs (M, f*, S3) | City level (25 km): 0.6 Continent level (2500 km): 38.5 Country level (750 km): 15.5 Region level (200 km): 4.2 Street level (1 km): 0.05 |
| photo-geolocation-estimation-on-im2gps | base (L, m) | City level (25 km): 35.0 Continent level (2500 km): 79.7 Country level (750 km): 64.1 Reference images: 0 Region level (200 km): 49.8 Street level (1 km): 13.5 Training images: 4.7M |
| photo-geolocation-estimation-on-im2gps | ISNs (M, f*, S3) | City level (25 km): 43.0 Continent level (2500 km): 80.2 Country level (750 km): 66.7 Reference images: 0 Region level (200 km): 51.9 Street level (1 km): 16.9 Training images: 4.7M |
| photo-geolocation-estimation-on-im2gps | base (M, f*) | City level (25 km): 40.9 Continent level (2500 km): 78.5 Country level (750 km): 65.4 Reference images: 0 Region level (200 km): 51.5 Street level (1 km): 15.2 Training images: 4.7M |
| photo-geolocation-estimation-on-im2gps3k | ISNs (M, f*, S3) | City level (25 km): 28.0 Continent level (2500 km): 66.0 Country level (750 km): 49.7 Region level (200 km): 36.6 Street level (1 km): 10.5 Training Images: 4.7M |
| photo-geolocation-estimation-on-im2gps3k | base (M, f*) | City level (25 km): 27.0 Continent level (2500 km): 66.0 Country level (750 km): 49.2 Region level (200 km): 35.6 Street level (1 km): 9.7 Training Images: 4.7M |
| photo-geolocation-estimation-on-im2gps3k | base (L, m) | City level (25 km): 24.9 Continent level (2500 km): 65.8 Country level (750 km): 48.8 Region level (200 km): 34.0 Street level (1 km): 8.3 Training Images: 4.7M |
| photo-geolocation-estimation-on-yfcc26k | ISNs (M, f*, S3) | City level (25 km): 12.3 Continent level (2500 km): 50.7 Country level (750 km): 31.9 Region level (200 km): 19.0 Street level (1 km): 5.3 Training Images: 4.7M |