5 months ago

PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

Ding Runyu ; Yang Jihan ; Xue Chuhui ; Zhang Wenqing ; Bai Song ; Qi Xiaojuan

Abstract

Open-vocabulary scene understanding aims to localize and recognize unseencategories beyond the annotated label space. The recent breakthrough of 2Dopen-vocabulary perception is largely driven by Internet-scale pairedimage-text data with rich vocabulary concepts. However, this success cannot bedirectly transferred to 3D scenarios due to the inaccessibility of large-scale3D-text pairs. To this end, we propose to distill knowledge encoded inpre-trained vision-language (VL) foundation models through captioningmulti-view images from 3D, which allows explicitly associating 3D andsemantic-rich captions. Further, to foster coarse-to-fine visual-semanticrepresentation learning from captions, we design hierarchical 3D-caption pairs,leveraging geometric constraints between 3D scenes and multi-view images.Finally, by employing contrastive learning, the model learns language-awareembeddings that connect 3D and text for open-vocabulary tasks. Our method notonly remarkably outperforms baseline methods by 25.8% $\sim$ 44.7% hIoU and14.5% $\sim$ 50.4% hAP$_{50}$ in open-vocabulary semantic and instancesegmentation, but also shows robust transferability on challenging zero-shotdomain transfer tasks. See the project website athttps://dingry.github.io/projects/PLA.

Code Repositories

cvmi-lab/pla

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-open-vocabulary-instance-segmentation-on-2	PLA	AP50 Base B6/N6: 46.9 AP50 Base B8/N4 : 59.0 AP50 Novel B6/N6: 9.8 AP50 Novel B8/N4: 8.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette