4 个月前

IconQA:抽象图表理解与视觉语言推理的新基准

IconQA:抽象图表理解与视觉语言推理的新基准

摘要

当前的视觉问答(VQA)任务主要集中在回答针对自然图像的人类标注问题。然而,除了自然图像之外,具有丰富语义的抽象图表在视觉理解和推理研究中仍较少受到关注。在这项工作中,我们引入了一种新的挑战——图标问答(IconQA),其目标是在图标图像的上下文中回答问题。我们发布了IconQA数据集,该数据集包含107,439个问题和三个子任务:多图选择、多文本选择和填空题。IconQA数据集灵感来源于现实世界的图表文字题,强调了抽象图表理解和综合认知推理的重要性。因此,IconQA不仅需要物体识别和文本理解等感知技能,还需要多种认知推理技能,如几何推理、常识推理和算术推理。为了促进潜在的IconQA模型学习图标图像的语义表示,我们进一步发布了包含645,687个彩色图标、涵盖377个类别的图标数据集Icon645。我们进行了广泛的用户研究和盲实验,并重现了一系列先进的VQA方法以对IconQA任务进行基准测试。此外,我们开发了一个强大的IconQA基线模型Patch-TRM,该模型应用了一个金字塔跨模态Transformer,并使用在图标数据集上预训练的输入图表嵌入。IconQA和Icon645数据集可在https://iconqa.github.io获取。

代码仓库

lupantech/iconqa
官方
pytorch

基准测试

基准方法指标
visual-question-answering-on-iconqaDFAF
Reasoning (Alg.): 50.27
Reasoning (Com.): 81.69
Reasoning (Cou.): 70.68
Reasoning (Est.): 99.02
Reasoning (Fra.): 77.60
Reasoning (Geo.): 81.80
Reasoning (Mea.): 98.83
Reasoning (Pat.): 56.60
Reasoning (Pro.): 85.70
Reasoning (Sce.): 67.01
Reasoning (Sen.): 84.11
Reasoning (Spa.): 51.42
Reasoning (Tim.): 67.72
Sub-tasks (Blank): 78.28
Sub-tasks (Img.): 77.72
Sub-tasks (Txt.): 72.17
visual-question-answering-on-iconqaQ-Only
Reasoning (Alg.): 28.02
Reasoning (Com.): 48.19
Reasoning (Cou.): 33.63
Reasoning (Est.): 40.46
Reasoning (Fra.): 33.06
Reasoning (Geo.): 38.03
Reasoning (Mea.): 38.07
Reasoning (Pat.): 33.66
Reasoning (Pro.): 40.76
Reasoning (Sce.): 35.37
Reasoning (Sen.): 45.25
Reasoning (Spa.): 37.14
Reasoning (Tim.): 48.09
Sub-tasks (Blank): 28.45
Sub-tasks (Img.): 41.64
Sub-tasks (Txt.): 36.86
visual-question-answering-on-iconqaViLBERT
Reasoning (Alg.): 50.62
Reasoning (Com.): 75.60
Reasoning (Cou.): 71.05
Reasoning (Est.): 99.22
Reasoning (Fra.): 74.09
Reasoning (Geo.): 80.05
Reasoning (Mea.): 99.07
Reasoning (Pat.): 62.78
Reasoning (Pro.): 70.94
Reasoning (Sce.): 58.52
Reasoning (Sen.): 81.78
Reasoning (Spa.): 49.46
Reasoning (Tim.): 66.72
Sub-tasks (Blank): 77.08
Sub-tasks (Img.): 76.66
Sub-tasks (Txt.): 70.47
visual-question-answering-on-iconqaI-Only
Reasoning (Alg.): 31.73
Reasoning (Com.): 45.26
Reasoning (Cou.): 37.64
Reasoning (Est.): 62.29
Reasoning (Fra.): 32.48
Reasoning (Geo.): 38.71
Reasoning (Mea.): 64.02
Reasoning (Pat.): 36.29
Reasoning (Pro.): 37.51
Reasoning (Sce.): 35.47
Reasoning (Sen.): 45.25
Reasoning (Spa.): 37.52
Reasoning (Tim.): 47.37
Sub-tasks (Blank): 46.65
Sub-tasks (Img.): 41.56
Sub-tasks (Txt.): 36.02
visual-question-answering-on-iconqaRandom
Reasoning (Alg.): 11.12
Reasoning (Com.): 41.20
Reasoning (Cou.): 18.38
Reasoning (Est.): 3.62
Reasoning (Fra.): 34.84
Reasoning (Geo.): 30.30
Reasoning (Mea.): 0.36
Reasoning (Pat.): 34.81
Reasoning (Pro.): 38.81
Reasoning (Sce.): 34.25
Reasoning (Sen.): 45.16
Reasoning (Spa.): 36.49
Reasoning (Tim.): 35.82
Sub-tasks (Blank): 0.29
Sub-tasks (Img.): 41.70
Sub-tasks (Txt.): 36.87
visual-question-answering-on-iconqaViLT
Reasoning (Alg.): 50.55
Reasoning (Com.): 84.95
Reasoning (Cou.): 71.13
Reasoning (Est.): 99.02
Reasoning (Fra.): 75.81
Reasoning (Geo.): 82.61
Reasoning (Mea.): 98.91
Reasoning (Pat.): 59.22
Reasoning (Pro.): 87.65
Reasoning (Sce.): 66.72
Reasoning (Sen.): 86.10
Reasoning (Spa.): 53.38
Reasoning (Tim.): 69.99
Sub-tasks (Blank): 79.27
Sub-tasks (Img.): 79.67
Sub-tasks (Txt.): 72.69
visual-question-answering-on-iconqaUNITER
Reasoning (Alg.): 49.18
Reasoning (Com.): 83.67
Reasoning (Cou.): 71.01
Reasoning (Est.): 99.41
Reasoning (Fra.): 78.37
Reasoning (Geo.): 81.31
Reasoning (Mea.): 99.38
Reasoning (Pat.): 60.81
Reasoning (Pro.): 87.84
Reasoning (Sce.): 61.25
Reasoning (Sen.): 86.10
Reasoning (Spa.): 48.34
Reasoning (Tim.): 69.77
Sub-tasks (Blank): 78.53
Sub-tasks (Img.): 78.71
Sub-tasks (Txt.): 72.39
visual-question-answering-on-iconqaMCAN
Reasoning (Alg.): 47.32
Reasoning (Com.): 82.73
Reasoning (Cou.): 68.94
Reasoning (Est.): 99.08
Reasoning (Fra.): 76.20
Reasoning (Geo.): 79.86
Reasoning (Mea.): 98.99
Reasoning (Pat.): 54.79
Reasoning (Pro.): 84.87
Reasoning (Sce.): 62.49
Reasoning (Sen.): 83.25
Reasoning (Spa.): 49.70
Reasoning (Tim.): 68.00
Sub-tasks (Blank): 74.52
Sub-tasks (Img.): 77.36
Sub-tasks (Txt.): 71.25
visual-question-answering-on-iconqaBAN
Reasoning (Alg.): 47.46
Reasoning (Com.): 82.12
Reasoning (Cou.): 67.56
Reasoning (Est.): 97.06
Reasoning (Fra.): 73.77
Reasoning (Geo.): 79.99
Reasoning (Mea.): 96.50
Reasoning (Pat.): 55.67
Reasoning (Pro.): 82.45
Reasoning (Sce.): 66.92
Reasoning (Sen.): 82.12
Reasoning (Spa.): 53.20
Reasoning (Tim.): 66.50
Sub-tasks (Blank): 75.54
Sub-tasks (Img.): 76.33
Sub-tasks (Txt.): 70.82
visual-question-answering-on-iconqaPatch-TRM
Reasoning (Alg.): 56.73
Reasoning (Com.): 87.00
Reasoning (Cou.): 77.81
Reasoning (Est.): 98.24
Reasoning (Fra.): 82.13
Reasoning (Geo.): 81.87
Reasoning (Mea.): 97.98
Reasoning (Pat.): 68.75
Reasoning (Pro.): 95.73
Reasoning (Sce.): 62.39
Reasoning (Sen.): 92.49
Reasoning (Spa.): 55.62
Reasoning (Tim.): 77.98
Sub-tasks (Blank): 83.62
Sub-tasks (Img.): 82.66
Sub-tasks (Txt.): 75.19
visual-question-answering-on-iconqaViT
Reasoning (Alg.): 51.10
Reasoning (Com.): 82.12
Reasoning (Cou.): 70.84
Reasoning (Est.): 98.95
Reasoning (Fra.): 77.41
Reasoning (Geo.): 82.60
Reasoning (Mea.): 98.76
Reasoning (Pat.): 58.46
Reasoning (Pro.): 86.07
Reasoning (Sce.): 68.80
Reasoning (Sen.): 84.72
Reasoning (Spa.): 54.64
Reasoning (Tim.): 68.66
Sub-tasks (Blank): 78.92
Sub-tasks (Img.): 79.15
Sub-tasks (Txt.): 72.34
visual-question-answering-on-iconqaTop-Down
Reasoning (Alg.): 50.00
Reasoning (Com.): 80.65
Reasoning (Cou.): 65.01
Reasoning (Est.): 99.54
Reasoning (Fra.): 72.43
Reasoning (Geo.): 80.07
Reasoning (Mea.): 99.46
Reasoning (Pat.): 55.01
Reasoning (Pro.): 83.75
Reasoning (Sce.): 58.22
Reasoning (Sen.): 84.54
Reasoning (Spa.): 45.78
Reasoning (Tim.): 68.28
Sub-tasks (Blank): 73.03
Sub-tasks (Img.): 75.92
Sub-tasks (Txt.): 68.51

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
IconQA:抽象图表理解与视觉语言推理的新基准 | 论文 | HyperAI超神经