5 months ago

Self-positioning Point-based Transformer for Point Cloud Understanding

Park Jinyoung ; Lee Sanghyeok ; Kim Sihyeon ; Xiong Yunyang ; Kim Hyunwoo J.

Abstract

Transformers have shown superior performance on various computer vision taskswith their capabilities to capture long-range dependencies. Despite thesuccess, it is challenging to directly apply Transformers on point clouds dueto their quadratic cost in the number of points. In this paper, we present aSelf-Positioning point-based Transformer (SPoTr), which is designed to captureboth local and global shape contexts with reduced complexity. Specifically,this architecture consists of local self-attention and self-positioningpoint-based global cross-attention. The self-positioning points, adaptivelylocated based on the input shape, consider both spatial and semanticinformation with disentangled attention to improve expressive power. With theself-positioning points, we propose a novel global cross-attention mechanismfor point clouds, which improves the scalability of global self-attention byallowing the attention module to compute attention weights with only a smallset of self-positioning points. Experiments show the effectiveness of SPoTr onthree point cloud tasks such as shape classification, part segmentation, andscene segmentation. In particular, our proposed model achieves an accuracy gainof 2.6% over the previous best models on shape classification withScanObjectNN. We also provide qualitative analyses to demonstrate theinterpretability of self-positioning points. The code of SPoTr is available athttps://github.com/mlvlab/SPoTr.

Code Repositories

mlvlab/spotr

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
3d-part-segmentation-on-shapenet-part	SPoTr	Class Average IoU: 85.4 Instance Average IoU: 87.2
3d-point-cloud-classification-on-scanobjectnn	SPoTr	Mean Accuracy: 86.8 Overall Accuracy: 88.6
semantic-segmentation-on-s3dis-area5	SPoTr	Number of params: N/A mAcc: 76.4 mIoU: 70.8 oAcc: 90.7
supervised-only-3d-point-cloud-classification	SPoTr	GFLOPs: 10.8 Number of params (M): 1.7 Overall Accuracy (PB_T50_RS): 88.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette