HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Tang Yuan ; Han Xu ; Li Xianzhi ; Yu Qiao ; Hao Yixue ; Hu Long ; Chen Min

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language
  Models using 2D Priors

Abstract

Large 2D vision-language models (2D-LLMs) have gained significant attentionby bridging Large Language Models (LLMs) with images using a simple projector.Inspired by their success, large 3D point cloud-language models (3D-LLMs) alsointegrate point clouds into LLMs. However, directly aligning point clouds withLLM requires expensive training costs, typically in hundreds of GPU-hours onA100, which hinders the development of 3D-LLMs. In this paper, we introduceMiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTAresults while training for only 27 hours on one RTX 3090. Specifically, wepropose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, whichcan leverage the similarity between 2D and 3D visual information. We introducea novel four-stage training strategy for modality alignment in a cascaded way,and a mixture of query experts module to adaptively aggregate features withhigh efficiency. Moreover, we utilize parameter-efficient fine-tuning methodsLoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, whichis up to 260x fewer than existing methods. Extensive experiments show thatMiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, withsignificantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12increase on GPT-4 evaluation score for the challenging object captioning taskcompared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.We are the first to explore the efficient 3D-LLM, offering new insights to thecommunity. Code and weights are available athttps://github.com/TangYuan96/MiniGPT-3D.

Code Repositories

tangyuan96/minigpt-3d
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-object-captioning-on-objaverse-1MiniGPT-3D
Sentence-BERT: 49.54
Correctness: 3.50
GPT-4: 57.06
Hallucination: 0.71
Precision: 83.14
SimCSE: 51.39
generative-3d-object-classification-on-1MiniGPT-3D
Objaverse (Average): 60.25
Objaverse (C): 60.50
Objaverse (I): 60.00
generative-3d-object-classification-on-2MiniGPT-3D
ModelNet40 (Average): 60.86
ModelNet40 (C): 59.97
ModelNet40 (I): 61.75

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors | Papers | HyperAI