Command Palette
Search for a command to run...
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang Xiaoyang Wu Xi Chen Hengshuang Zhao Lei Zhu Joan Lasenby
Abstract
In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3Dopen-vocabulary scene understanding. The OpenIns3D framework employs a"Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic maskproposals in 3D point clouds, the "Snap" module generates synthetic scene-levelimages at multiple scales and leverages 2D vision-language models to extractinteresting objects, and the "Lookup" module searches through the outcomes of"Snap" to assign category names to the proposed masks. This approach, yetsimple, achieves state-of-the-art performance across a wide range of 3Dopen-vocabulary tasks, including recognition, object detection, and instancesegmentation, on both indoor and outdoor datasets. Moreover, OpenIns3Dfacilitates effortless switching between different 2D detectors withoutrequiring retraining. When integrated with powerful 2D open-world models, itachieves excellent results in scene understanding tasks. Furthermore, whencombined with LLM-powered 2D models, OpenIns3D exhibits an impressivecapability to comprehend and process highly complex text queries that demandintricate reasoning and real-world knowledge. Project page:https://zheninghuang.github.io/OpenIns3D/