HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Mo Wentao ; Liu Yang

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion
  Approach for 3D VQA

Abstract

In 3D Visual Question Answering (3D VQA), the scarcity of fully annotateddata and limited visual content diversity hampers the generalization to novelscenes and 3D concepts (e.g., only around 800 scenes are utilized in ScanQA andSQA dataset). Current approaches resort supplement 3D reasoning with 2Dinformation. However, these methods face challenges: either they use top-down2D views that introduce overly complex and sometimes question-irrelevant visualclues, or they rely on globally aggregated scene/image-level representationsfrom 2D VLMs, losing the fine-grained vision-language correlations. To overcomethese limitations, our approach utilizes question-conditional 2D view selectionprocedure, pinpointing semantically relevant 2D inputs for crucial visualclues. We then integrate this 2D knowledge into the 3D-VQA system via atwo-branch Transformer structure. This structure, featuring a Twin-Transformerdesign, compactly combines 2D and 3D modalities and captures fine-grainedcorrelations between modalities, allowing them mutually augmenting each other.Integrating proposed mechanisms above, we present BridgeQA, that offers a freshperspective on multi-modal transformer-based architectures for 3D-VQA.Experiments validate that BridgeQA achieves state-of-the-art on 3D-VQA datasetsand significantly outperforms existing solutions. Code is available at$\href{https://github.com/matthewdm0816/BridgeQA}{\text{this URL}}$.

Code Repositories

matthewdm0816/bridgeqa
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
3d-question-answering-3d-qa-on-scanqa-test-wBridgeQA
BLEU-1: 34.49
BLEU-4: 24.06
CIDEr: 83.75
Exact Match: 31.29
METEOR: 16.51
ROUGE: 43.26

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA | Papers | HyperAI