Command Palette
Search for a command to run...
Sergio Izquierdo Javier Civera

Abstract
The task of Visual Place Recognition (VPR) aims to match a query image against references from an extensive database of images from different places, relying solely on visual cues. State-of-the-art pipelines focus on the aggregation of features extracted from a deep backbone, in order to form a global descriptor for each image. In this context, we introduce SALAD (Sinkhorn Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem. In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative, enhancing the overall descriptor quality. Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides enhanced description power for the local features, and dramatically reduces the required training time. As a result, our single-stage method not only surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost. Code and models are available at https://github.com/serizba/salad.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| visual-place-recognition-on-mapillary-test | DINOv2 SALAD | Recall@1: 75 Recall@10: 91.3 Recall@5: 88.8 |
| visual-place-recognition-on-mapillary-val | DINOv2 SALAD | Recall@1: 92.2 Recall@10: 97 Recall@5: 96.4 |
| visual-place-recognition-on-nordland | DINOv2 SALAD (1-frame thr.) | Recall@1: 85.2 Recall@5: 98.5 |
| visual-place-recognition-on-pittsburgh-250k | DINOv2 SALAD | Recall@1: 95.1 Recall@10: 99.1 Recall@5: 98.5 |
| visual-place-recognition-on-sped | DINOv2 SALAD | Recall@1: 92.1 Recall@10: 96.5 Recall@5: 96.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.