Command Palette
Search for a command to run...
Zhang Renrui ; Guo Ziyu ; Zhang Wei ; Li Kunchang ; Miao Xupeng ; Cui Bin ; Qiao Yu ; Gao Peng ; Li Hongsheng

Abstract
Recently, zero-shot and few-shot learning via Contrastive Vision-LanguagePre-training (CLIP) have shown inspirational performance on 2D visualrecognition, which learns to match images with their corresponding texts inopen-vocabulary settings. However, it remains under explored that whether CLIP,pre-trained by large-scale image-text pairs in 2D, can be generalized to 3Drecognition. In this paper, we identify such a setting is feasible by proposingPointCLIP, which conducts alignment between CLIP-encoded point cloud and 3Dcategory texts. Specifically, we encode a point cloud by projecting it intomulti-view depth maps without rendering, and aggregate the view-wise zero-shotprediction to achieve knowledge transfer from 2D to 3D. On top of that, wedesign an inter-view adapter to better extract the global feature andadaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in2D. By just fine-tuning the lightweight adapter in the few-shot settings, theperformance of PointCLIP could be largely improved. In addition, we observe thecomplementary property between PointCLIP and classical 3D-supervised networks.By simple ensembling, PointCLIP boosts baseline's performance and evensurpasses state-of-the-art models. Therefore, PointCLIP is a promisingalternative for effective 3D point cloud understanding via CLIP under lowresource cost and data regime. We conduct thorough experiments onwidely-adopted ModelNet10, ModelNet40 and the challenging ScanObjectNN todemonstrate the effectiveness of PointCLIP. The code is released athttps://github.com/ZrrSkywalker/PointCLIP.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-open-vocabulary-instance-segmentation-on-3 | PointCLIP | AP50: 02.6 |
| training-free-3d-part-segmentation-on | PointCLIP | Need 3D Data?: No mIoU: 31.0 |
| training-free-3d-point-cloud-classification | PointCLIP | Accuracy (%): 20.2 Need 3D Data?: No |
| training-free-3d-point-cloud-classification-1 | PointCLIP | Accuracy (%): 15.4 Need 3D Data?: No |
| zero-shot-transfer-3d-point-cloud | PointCLIP | Accuracy (%): 20.18 |
| zero-shot-transfer-3d-point-cloud-1 | PointCLIP | Accuracy (%): 30.23 |
| zero-shot-transfer-3d-point-cloud-2 | PointCLIP | OBJ_BG Accuracy(%): 21.34 OBJ_ONLY Accuracy(%): 19.28 PB_T50_RS Accuracy (%): 15.38 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.