HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene
  Understanding

Abstract

Complex 3D scene understanding has gained increasing attention, with sceneencoding strategies playing a crucial role in this success. However, theoptimal scene encoding strategies for various scenarios remain unclear,particularly compared to their image-based counterparts. To address this issue,we present a comprehensive study that probes various visual encoding models for3D scene understanding, identifying the strengths and limitations of each modelacross different scenarios. Our evaluation spans seven vision foundationencoders, including image-based, video-based, and 3D foundation models. Weevaluate these models in four tasks: Vision-Language Scene Reasoning, VisualGrounding, Segmentation, and Registration, each focusing on different aspectsof scene understanding. Our evaluations yield key findings: DINOv2 demonstratessuperior performance, video models excel in object-level tasks, diffusionmodels benefit geometric tasks, and language-pretrained models show unexpectedlimitations in language-related tasks. These insights challenge someconventional understandings, provide novel perspectives on leveraging visualfoundation models, and highlight the need for more flexible encoder selectionin future vision-language and scene-understanding tasks. Code:https://github.com/YunzeMan/Lexicon3D

Code Repositories

yunzeman/lexicon3d
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-sqa3dLexicon3D
AnswerExactMatch (Question Answering): 50.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp