8 months ago

3D Machine Vision

Object Detection

Visual Question Answering

Computer Vision

Zhening Huang Xiaoyang Wu Xi Chen Hengshuang Zhao Lei Zhu Joan Lasenby

Abstract

In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3Dopen-vocabulary scene understanding. The OpenIns3D framework employs a"Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic maskproposals in 3D point clouds, the "Snap" module generates synthetic scene-levelimages at multiple scales and leverages 2D vision-language models to extractinteresting objects, and the "Lookup" module searches through the outcomes of"Snap" to assign category names to the proposed masks. This approach, yetsimple, achieves state-of-the-art performance across a wide range of 3Dopen-vocabulary tasks, including recognition, object detection, and instancesegmentation, on both indoor and outdoor datasets. Moreover, OpenIns3Dfacilitates effortless switching between different 2D detectors withoutrequiring retraining. When integrated with powerful 2D open-world models, itachieves excellent results in scene understanding tasks. Furthermore, whencombined with LLM-powered 2D models, OpenIns3D exhibits an impressivecapability to comprehend and process highly complex text queries that demandintricate reasoning and real-world knowledge. Project page:https://zheninghuang.github.io/OpenIns3D/

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

3D Machine Vision

Object Detection

Visual Question Answering

Computer Vision

Zhening Huang Xiaoyang Wu Xi Chen Hengshuang Zhao Lei Zhu Joan Lasenby

Abstract

In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3Dopen-vocabulary scene understanding. The OpenIns3D framework employs a"Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic maskproposals in 3D point clouds, the "Snap" module generates synthetic scene-levelimages at multiple scales and leverages 2D vision-language models to extractinteresting objects, and the "Lookup" module searches through the outcomes of"Snap" to assign category names to the proposed masks. This approach, yetsimple, achieves state-of-the-art performance across a wide range of 3Dopen-vocabulary tasks, including recognition, object detection, and instancesegmentation, on both indoor and outdoor datasets. Moreover, OpenIns3Dfacilitates effortless switching between different 2D detectors withoutrequiring retraining. When integrated with powerful 2D open-world models, itachieves excellent results in scene understanding tasks. Furthermore, whencombined with LLM-powered 2D models, OpenIns3D exhibits an impressivecapability to comprehend and process highly complex text queries that demandintricate reasoning and real-world knowledge. Project page:https://zheninghuang.github.io/OpenIns3D/

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp