Command Palette
Search for a command to run...
Liu Haotian ; Cai Mu ; Lee Yong Jae

Abstract
Masked autoencoding has achieved great success for self-supervised learningin the image and language domains. However, mask based pretraining has yet toshow benefits for point cloud understanding, likely due to standard backboneslike PointNet being unable to properly handle the training versus testingdistribution mismatch introduced by masking during training. In this paper, webridge this gap by proposing a discriminative mask pretraining Transformerframework, MaskPoint}, for point clouds. Our key idea is to represent the pointcloud as discrete occupancy values (1 if part of the point cloud; 0 if not),and perform simple binary classification between masked object points andsampled noise points as the proxy task. In this way, our approach is robust tothe point sampling variance in point clouds, and facilitates learning richrepresentations. We evaluate our pretrained models across several downstreamtasks, including 3D shape classification, segmentation, and real-word objectdetection, and demonstrate state-of-the-art results while achieving asignificant pretraining speedup (e.g., 4.1x on ScanNet) compared to the priorstate-of-the-art Transformer baseline. Code is available athttps://github.com/haotian-liu/MaskPoint.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| few-shot-3d-point-cloud-classification-on-1 | MaskPoint | Overall Accuracy: 95.0 Standard Deviation: 3.7 |
| few-shot-3d-point-cloud-classification-on-2 | MaskPoint | Overall Accuracy: 97.2 Standard Deviation: 1.7 |
| few-shot-3d-point-cloud-classification-on-3 | MaskPoint | Overall Accuracy: 91.4 Standard Deviation: 4.0 |
| few-shot-3d-point-cloud-classification-on-4 | MaskPoint | Overall Accuracy: 93.4 Standard Deviation: 3.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.