5 months ago

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Fan Zhaoxin ; Song Zhenbo ; Liu Hongyan ; Lu Zhiwu ; He Jun ; Du Xiaoyong

Abstract

Point cloud-based large scale place recognition is fundamental for manyapplications like Simultaneous Localization and Mapping (SLAM). Although manymodels have been proposed and have achieved good performance by learningshort-range local features, long-range contextual properties have often beenneglected. Moreover, the model size has also become a bottleneck for their wideapplications. To overcome these challenges, we propose a super light-weightnetwork model termed SVT-Net for large scale place recognition. Specifically,on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-basedSparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer(CSVT) are proposed to learn both short-range local features and long-rangecontextual features in this model. Consisting of ASVT and CSVT, SVT-Net canachieve state-of-the-art on benchmark datasets in terms of both accuracy andspeed with a super-light model size (0.9M). Meanwhile, two simplified versionsof SVT-Net are introduced, which also achieve state-of-the-art and furtherreduce the model size to 0.8M and 0.4M respectively.

Benchmarks

Benchmark	Methodology	Metrics
3d-place-recognition-on-oxford-robotcar	SVT-Net	AR@1: 93.7 AR@1%: 97.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning