HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
视觉问答 (VQA)
Visual Question Answering On Vqa V2 Test Std
Visual Question Answering On Vqa V2 Test Std
评估指标
overall
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
overall
Paper Title
Repository
BEiT-3
84.03
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
mPLUG-Huge
83.62
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
ONE-PEACE
82.52
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
X2-VLM (large)
81.8
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
VLMo
81.30
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
SimVLM
80.34
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
X2-VLM (base)
80.2
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
VAST
80.19
-
-
VALOR
78.62
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Prompt Tuning
78.53
Prompt Tuning for Generative Multimodal Pretrained Models
Prismer
78.49
Prismer: A Vision-Language Model with Multi-Task Experts
MSR + MS Cog. Svcs., X10 models
77.45
VinVL: Revisiting Visual Representations in Vision-Language Models
MSR + MS Cog. Svcs.
76.63
VinVL: Revisiting Visual Representations in Vision-Language Models
ALBEF (14M)
76.04
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
BGN, ensemble
75.92
Bilinear Graph Networks for Visual Question Answering
-
ERNIE-ViL-single model
74.93
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
-
Single, w/o VLP
74.16
In Defense of Grid Features for Visual Question Answering
Single, w/o VLP
73.86
Deep Multimodal Neural Architecture Search
UNITER (Large)
73.4
UNITER: UNiversal Image-TExt Representation Learning
X-101 grid features + MCAN
72.71
In Defense of Grid Features for Visual Question Answering
0 of 38 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Test Std | SOTA | HyperAI超神经