4 个月前

ShapeLLM:面向实体交互的通用3D物体理解

ShapeLLM:面向实体交互的通用3D物体理解

摘要

本文介绍了ShapeLLM,首个为具身交互设计的三维多模态大语言模型(LLM),探索了通过三维点云和语言实现的通用三维物体理解。ShapeLLM基于改进的三维编码器构建,该编码器通过扩展ReCon至ReCon++,利用多视角图像蒸馏技术增强了几何理解能力。通过使用ReCon++作为大语言模型的三维点云输入编码器,ShapeLLM在构造的指令跟随数据上进行训练,并在我们新的人工整理基准测试集3D MM-Vet上进行了测试。ReCon++和ShapeLLM在三维几何理解和语言统一的三维交互任务(如具身视觉定位)中达到了最先进的性能。项目页面:https://qizekun.github.io/shapellm/

代码仓库

qizekun/ShapeLLM
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
3d-object-captioning-on-objaverse-1ShapeLLM-13B
Sentence-BERT: 48.52
GPT-4: 48.94
SimCSE: 49.98
3d-object-captioning-on-objaverse-1ShapeLLM-7B
Sentence-BERT: 48.20
GPT-4: 46.92
SimCSE: 49.23
3d-point-cloud-classification-on-modelnet40ReCon++
Overall Accuracy: 95.0
3d-point-cloud-classification-on-scanobjectnnReCon++
OBJ-BG (OA): 98.80
OBJ-ONLY (OA): 97.59
Overall Accuracy: 95.25
3d-point-cloud-linear-classification-onReCon++
Overall Accuracy: 93.6
3d-question-answering-3d-qa-on-3d-mm-vetShapeLLM-13B
Overall Accuracy: 53.1
3d-question-answering-3d-qa-on-3d-mm-vetShapeLLM-7B
Overall Accuracy: 47.4
few-shot-3d-point-cloud-classification-on-1ReCon++
Overall Accuracy: 98.0
Standard Deviation: 2.3
few-shot-3d-point-cloud-classification-on-2ReCon++
Overall Accuracy: 99.5
Standard Deviation: 0.8
few-shot-3d-point-cloud-classification-on-3ReCon++
Overall Accuracy: 94.5
Standard Deviation: 4.1
few-shot-3d-point-cloud-classification-on-4ReCon++
Overall Accuracy: 96.5
Standard Deviation: 3.0
generative-3d-object-classification-on-1ShapeLLM-13B
Objaverse (Average): 54.00
generative-3d-object-classification-on-1ShapeLLM-7B
Objaverse (Average): 54.50
generative-3d-object-classification-on-2ShapeLLM-13B
ModelNet40 (Average): 52.96
generative-3d-object-classification-on-2ShapeLLM-7B
ModelNet40 (Average): 53.08
zero-shot-transfer-3d-point-cloudReCon++
Accuracy (%): 87.3
zero-shot-transfer-3d-point-cloud-2ReCon++
OBJ_ONLY Accuracy(%): 65.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
ShapeLLM:面向实体交互的通用3D物体理解 | 论文 | HyperAI超神经