Question Answering On Sqa3D
评估指标
AnswerExactMatch (Question Answering)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| CREMA | 54.6 | CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion | |
| Situation3D | 52.6 | Situational Awareness Matters in 3D Vision Language Reasoning | |
| Lexicon3D | 50.7 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | |
| LM4VisualEncoding | 48.09 | Frozen Transformers in Language Models Are Effective Visual Encoder Layers | |
| ScanQA (w/ auxiliary loss) | 47.20 | SQA3D: Situated Question Answering in 3D Scenes | |
| ScanQA | 46.58 | SQA3D: Situated Question Answering in 3D Scenes | |
| MCAN | 43.42 | Deep Modular Co-Attention Networks for Visual Question Answering |
0 of 7 row(s) selected.