Command Palette
Search for a command to run...
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Tang Yuan ; Han Xu ; Li Xianzhi ; Yu Qiao ; Hao Yixue ; Hu Long ; Chen Min

Abstract
Large 2D vision-language models (2D-LLMs) have gained significant attentionby bridging Large Language Models (LLMs) with images using a simple projector.Inspired by their success, large 3D point cloud-language models (3D-LLMs) alsointegrate point clouds into LLMs. However, directly aligning point clouds withLLM requires expensive training costs, typically in hundreds of GPU-hours onA100, which hinders the development of 3D-LLMs. In this paper, we introduceMiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTAresults while training for only 27 hours on one RTX 3090. Specifically, wepropose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, whichcan leverage the similarity between 2D and 3D visual information. We introducea novel four-stage training strategy for modality alignment in a cascaded way,and a mixture of query experts module to adaptively aggregate features withhigh efficiency. Moreover, we utilize parameter-efficient fine-tuning methodsLoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, whichis up to 260x fewer than existing methods. Extensive experiments show thatMiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, withsignificantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12increase on GPT-4 evaluation score for the challenging object captioning taskcompared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.We are the first to explore the efficient 3D-LLM, offering new insights to thecommunity. Code and weights are available athttps://github.com/TangYuan96/MiniGPT-3D.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-object-captioning-on-objaverse-1 | MiniGPT-3D | Sentence-BERT: 49.54 Correctness: 3.50 GPT-4: 57.06 Hallucination: 0.71 Precision: 83.14 SimCSE: 51.39 |
| generative-3d-object-classification-on-1 | MiniGPT-3D | Objaverse (Average): 60.25 Objaverse (C): 60.50 Objaverse (I): 60.00 |
| generative-3d-object-classification-on-2 | MiniGPT-3D | ModelNet40 (Average): 60.86 ModelNet40 (C): 59.97 ModelNet40 (I): 61.75 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.