Command Palette
Search for a command to run...
Qi Zekun ; Dong Runpei ; Zhang Shaochen ; Geng Haoran ; Han Chunrui ; Ge Zheng ; Yi Li ; Ma Kaisheng

Abstract
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model(LLM) designed for embodied interaction, exploring a universal 3D objectunderstanding with 3D point clouds and languages. ShapeLLM is built upon animproved 3D encoder by extending ReCon to ReCon++ that benefits from multi-viewimage distillation for enhanced geometry understanding. By utilizing ReCon++ asthe 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructedinstruction-following data and tested on our newly human-curated benchmark, 3DMM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3Dgeometry understanding and language-unified 3D interaction tasks, such asembodied visual grounding. Project page: https://qizekun.github.io/shapellm/
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-object-captioning-on-objaverse-1 | ShapeLLM-13B | Sentence-BERT: 48.52 GPT-4: 48.94 SimCSE: 49.98 |
| 3d-object-captioning-on-objaverse-1 | ShapeLLM-7B | Sentence-BERT: 48.20 GPT-4: 46.92 SimCSE: 49.23 |
| 3d-point-cloud-classification-on-modelnet40 | ReCon++ | Overall Accuracy: 95.0 |
| 3d-point-cloud-classification-on-scanobjectnn | ReCon++ | OBJ-BG (OA): 98.80 OBJ-ONLY (OA): 97.59 Overall Accuracy: 95.25 |
| 3d-point-cloud-linear-classification-on | ReCon++ | Overall Accuracy: 93.6 |
| 3d-question-answering-3d-qa-on-3d-mm-vet | ShapeLLM-13B | Overall Accuracy: 53.1 |
| 3d-question-answering-3d-qa-on-3d-mm-vet | ShapeLLM-7B | Overall Accuracy: 47.4 |
| few-shot-3d-point-cloud-classification-on-1 | ReCon++ | Overall Accuracy: 98.0 Standard Deviation: 2.3 |
| few-shot-3d-point-cloud-classification-on-2 | ReCon++ | Overall Accuracy: 99.5 Standard Deviation: 0.8 |
| few-shot-3d-point-cloud-classification-on-3 | ReCon++ | Overall Accuracy: 94.5 Standard Deviation: 4.1 |
| few-shot-3d-point-cloud-classification-on-4 | ReCon++ | Overall Accuracy: 96.5 Standard Deviation: 3.0 |
| generative-3d-object-classification-on-1 | ShapeLLM-13B | Objaverse (Average): 54.00 |
| generative-3d-object-classification-on-1 | ShapeLLM-7B | Objaverse (Average): 54.50 |
| generative-3d-object-classification-on-2 | ShapeLLM-13B | ModelNet40 (Average): 52.96 |
| generative-3d-object-classification-on-2 | ShapeLLM-7B | ModelNet40 (Average): 53.08 |
| zero-shot-transfer-3d-point-cloud | ReCon++ | Accuracy (%): 87.3 |
| zero-shot-transfer-3d-point-cloud-2 | ReCon++ | OBJ_ONLY Accuracy(%): 65.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.