Command Palette
Search for a command to run...
SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition
Fan Zhaoxin ; Song Zhenbo ; Liu Hongyan ; Lu Zhiwu ; He Jun ; Du Xiaoyong

Abstract
Point cloud-based large scale place recognition is fundamental for manyapplications like Simultaneous Localization and Mapping (SLAM). Although manymodels have been proposed and have achieved good performance by learningshort-range local features, long-range contextual properties have often beenneglected. Moreover, the model size has also become a bottleneck for their wideapplications. To overcome these challenges, we propose a super light-weightnetwork model termed SVT-Net for large scale place recognition. Specifically,on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-basedSparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer(CSVT) are proposed to learn both short-range local features and long-rangecontextual features in this model. Consisting of ASVT and CSVT, SVT-Net canachieve state-of-the-art on benchmark datasets in terms of both accuracy andspeed with a super-light model size (0.9M). Meanwhile, two simplified versionsof SVT-Net are introduced, which also achieve state-of-the-art and furtherreduce the model size to 0.8M and 0.4M respectively.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-place-recognition-on-oxford-robotcar | SVT-Net | AR@1: 93.7 AR@1%: 97.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.