HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
视觉问答 (VQA)
Visual Question Answering On Ok Vqa
Visual Question Answering On Ok Vqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
PaLI-X-VPD
66.8
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
-
PaLM-E-562B
66.1
PaLM-E: An Embodied Multimodal Language Model
PaLI-X (Single-task FT)
66.1
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI 17B
64.5
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Prophet
62.5
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
RA-VQA-v2 (BLIP 2)
62.08
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
A Simple Baseline for KB-VQA
61.2
A Simple Baseline for Knowledge-Based Visual Question Answering
-
PromptCap
60.4
PromptCap: Prompt-Guided Task-Aware Image Captioning
ReVeaL WIT + CC12M + Wikidata + VQA-2
59.1
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Lyrics
58.2
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
-
REVIVE (Ensemble)
58.0
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
REVIVE (Single)
56.6
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
RA-VQA-v2 (T5-large)
54.85
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
RA-VQA (T5-large)
54.48
Retrieval Augmented Visual Question Answering with Outside Knowledge
VK-OOD
52.4
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
-
VK-OOD
52.4
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
RA-VQA-FrDPR (T5-large)
51.22
Retrieval Augmented Visual Question Answering with Outside Knowledge
Flamingo80B
50.6
Flamingo: a Visual Language Model for Few-Shot Learning
TRiG (T5-Large)
50.50
Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering
-
HYDRA
48.6
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 37 row(s) selected.
Previous
Next
Visual Question Answering On Ok Vqa | SOTA | HyperAI超神经