HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
视觉推理
Visual Reasoning On Nlvr2 Dev
Visual Reasoning On Nlvr2 Dev
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
BEiT-3
91.51
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
X2-VLM (large)
88.7
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
XFM (base)
87.6
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
X2-VLM (base)
86.2
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
CoCa
86.1
CoCa: Contrastive Captioners are Image-Text Foundation Models
VLMo
85.64
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
VK-OOD
84.6
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
-
SimVLM
84.53
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
X-VLM (base)
84.41
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
VK-OOD
83.9
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
ALBEF (14M)
83.14
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
SOHO
76.37
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
ViLT-B/32
75.7
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
LXMERT (Pre-train + scratch)
74.9
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
VisualBERT
66.7
VisualBERT: A Simple and Performant Baseline for Vision and Language
0 of 15 row(s) selected.
Previous
Next
Visual Reasoning On Nlvr2 Dev | SOTA | HyperAI超神经