HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
视觉问答 (VQA)
Visual Question Answering On Docvqa Test
Visual Question Answering On Docvqa Test
评估指标
ANLS
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
ANLS
Paper Title
Repository
Human
0.9436
DocVQA: A Dataset for VQA on Document Images
MLCD-Embodied-7B
0.916
Multi-label Cluster Discrimination for Visual Representation Learning
SMoLA-PaLI-X Specialist
0.908
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
SMoLA-PaLI-X Generalist
0.906
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
Qwen-VL-Plus
0.9024
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
ScreenAI 5B (4.62 B params, w/OCR)
0.8988
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
PaLI-3 (w/ OCR)
0.886
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
ERNIE-Layout large (ensemble)
0.8841
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
GPT-4
0.884
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
DocFormerv2-large
0.8784
DocFormerv2: Local Features for Document Understanding
UDOP (aux)
0.878
Unifying Vision, Text, and Layout for Universal Document Processing
PaLI-3
0.876
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
TILT-Large
0.8705
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
PaLI-X (Single-task FT w/ OCR)
0.868
PaLI-X: On Scaling up a Multilingual Vision and Language Model
LayoutLMv2LARGE
0.8672
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
ERNIE-Layout large
0.8486
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
UDOP
0.847
Unifying Vision, Text, and Layout for Universal Document Processing
TILT-Base
0.8392
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Claude + LATIN-Prompt
0.8336
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
GPT-3.5 + LATIN-Prompt
0.8255
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
0 of 33 row(s) selected.
Previous
Next
Visual Question Answering On Docvqa Test | SOTA | HyperAI超神经