
摘要
当前的视觉问答(VQA)任务主要集中在回答针对自然图像的人类标注问题。然而,除了自然图像之外,具有丰富语义的抽象图表在视觉理解和推理研究中仍较少受到关注。在这项工作中,我们引入了一种新的挑战——图标问答(IconQA),其目标是在图标图像的上下文中回答问题。我们发布了IconQA数据集,该数据集包含107,439个问题和三个子任务:多图选择、多文本选择和填空题。IconQA数据集灵感来源于现实世界的图表文字题,强调了抽象图表理解和综合认知推理的重要性。因此,IconQA不仅需要物体识别和文本理解等感知技能,还需要多种认知推理技能,如几何推理、常识推理和算术推理。为了促进潜在的IconQA模型学习图标图像的语义表示,我们进一步发布了包含645,687个彩色图标、涵盖377个类别的图标数据集Icon645。我们进行了广泛的用户研究和盲实验,并重现了一系列先进的VQA方法以对IconQA任务进行基准测试。此外,我们开发了一个强大的IconQA基线模型Patch-TRM,该模型应用了一个金字塔跨模态Transformer,并使用在图标数据集上预训练的输入图表嵌入。IconQA和Icon645数据集可在https://iconqa.github.io获取。
代码仓库
lupantech/iconqa
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| visual-question-answering-on-iconqa | DFAF | Reasoning (Alg.): 50.27 Reasoning (Com.): 81.69 Reasoning (Cou.): 70.68 Reasoning (Est.): 99.02 Reasoning (Fra.): 77.60 Reasoning (Geo.): 81.80 Reasoning (Mea.): 98.83 Reasoning (Pat.): 56.60 Reasoning (Pro.): 85.70 Reasoning (Sce.): 67.01 Reasoning (Sen.): 84.11 Reasoning (Spa.): 51.42 Reasoning (Tim.): 67.72 Sub-tasks (Blank): 78.28 Sub-tasks (Img.): 77.72 Sub-tasks (Txt.): 72.17 |
| visual-question-answering-on-iconqa | Q-Only | Reasoning (Alg.): 28.02 Reasoning (Com.): 48.19 Reasoning (Cou.): 33.63 Reasoning (Est.): 40.46 Reasoning (Fra.): 33.06 Reasoning (Geo.): 38.03 Reasoning (Mea.): 38.07 Reasoning (Pat.): 33.66 Reasoning (Pro.): 40.76 Reasoning (Sce.): 35.37 Reasoning (Sen.): 45.25 Reasoning (Spa.): 37.14 Reasoning (Tim.): 48.09 Sub-tasks (Blank): 28.45 Sub-tasks (Img.): 41.64 Sub-tasks (Txt.): 36.86 |
| visual-question-answering-on-iconqa | ViLBERT | Reasoning (Alg.): 50.62 Reasoning (Com.): 75.60 Reasoning (Cou.): 71.05 Reasoning (Est.): 99.22 Reasoning (Fra.): 74.09 Reasoning (Geo.): 80.05 Reasoning (Mea.): 99.07 Reasoning (Pat.): 62.78 Reasoning (Pro.): 70.94 Reasoning (Sce.): 58.52 Reasoning (Sen.): 81.78 Reasoning (Spa.): 49.46 Reasoning (Tim.): 66.72 Sub-tasks (Blank): 77.08 Sub-tasks (Img.): 76.66 Sub-tasks (Txt.): 70.47 |
| visual-question-answering-on-iconqa | I-Only | Reasoning (Alg.): 31.73 Reasoning (Com.): 45.26 Reasoning (Cou.): 37.64 Reasoning (Est.): 62.29 Reasoning (Fra.): 32.48 Reasoning (Geo.): 38.71 Reasoning (Mea.): 64.02 Reasoning (Pat.): 36.29 Reasoning (Pro.): 37.51 Reasoning (Sce.): 35.47 Reasoning (Sen.): 45.25 Reasoning (Spa.): 37.52 Reasoning (Tim.): 47.37 Sub-tasks (Blank): 46.65 Sub-tasks (Img.): 41.56 Sub-tasks (Txt.): 36.02 |
| visual-question-answering-on-iconqa | Random | Reasoning (Alg.): 11.12 Reasoning (Com.): 41.20 Reasoning (Cou.): 18.38 Reasoning (Est.): 3.62 Reasoning (Fra.): 34.84 Reasoning (Geo.): 30.30 Reasoning (Mea.): 0.36 Reasoning (Pat.): 34.81 Reasoning (Pro.): 38.81 Reasoning (Sce.): 34.25 Reasoning (Sen.): 45.16 Reasoning (Spa.): 36.49 Reasoning (Tim.): 35.82 Sub-tasks (Blank): 0.29 Sub-tasks (Img.): 41.70 Sub-tasks (Txt.): 36.87 |
| visual-question-answering-on-iconqa | ViLT | Reasoning (Alg.): 50.55 Reasoning (Com.): 84.95 Reasoning (Cou.): 71.13 Reasoning (Est.): 99.02 Reasoning (Fra.): 75.81 Reasoning (Geo.): 82.61 Reasoning (Mea.): 98.91 Reasoning (Pat.): 59.22 Reasoning (Pro.): 87.65 Reasoning (Sce.): 66.72 Reasoning (Sen.): 86.10 Reasoning (Spa.): 53.38 Reasoning (Tim.): 69.99 Sub-tasks (Blank): 79.27 Sub-tasks (Img.): 79.67 Sub-tasks (Txt.): 72.69 |
| visual-question-answering-on-iconqa | UNITER | Reasoning (Alg.): 49.18 Reasoning (Com.): 83.67 Reasoning (Cou.): 71.01 Reasoning (Est.): 99.41 Reasoning (Fra.): 78.37 Reasoning (Geo.): 81.31 Reasoning (Mea.): 99.38 Reasoning (Pat.): 60.81 Reasoning (Pro.): 87.84 Reasoning (Sce.): 61.25 Reasoning (Sen.): 86.10 Reasoning (Spa.): 48.34 Reasoning (Tim.): 69.77 Sub-tasks (Blank): 78.53 Sub-tasks (Img.): 78.71 Sub-tasks (Txt.): 72.39 |
| visual-question-answering-on-iconqa | MCAN | Reasoning (Alg.): 47.32 Reasoning (Com.): 82.73 Reasoning (Cou.): 68.94 Reasoning (Est.): 99.08 Reasoning (Fra.): 76.20 Reasoning (Geo.): 79.86 Reasoning (Mea.): 98.99 Reasoning (Pat.): 54.79 Reasoning (Pro.): 84.87 Reasoning (Sce.): 62.49 Reasoning (Sen.): 83.25 Reasoning (Spa.): 49.70 Reasoning (Tim.): 68.00 Sub-tasks (Blank): 74.52 Sub-tasks (Img.): 77.36 Sub-tasks (Txt.): 71.25 |
| visual-question-answering-on-iconqa | BAN | Reasoning (Alg.): 47.46 Reasoning (Com.): 82.12 Reasoning (Cou.): 67.56 Reasoning (Est.): 97.06 Reasoning (Fra.): 73.77 Reasoning (Geo.): 79.99 Reasoning (Mea.): 96.50 Reasoning (Pat.): 55.67 Reasoning (Pro.): 82.45 Reasoning (Sce.): 66.92 Reasoning (Sen.): 82.12 Reasoning (Spa.): 53.20 Reasoning (Tim.): 66.50 Sub-tasks (Blank): 75.54 Sub-tasks (Img.): 76.33 Sub-tasks (Txt.): 70.82 |
| visual-question-answering-on-iconqa | Patch-TRM | Reasoning (Alg.): 56.73 Reasoning (Com.): 87.00 Reasoning (Cou.): 77.81 Reasoning (Est.): 98.24 Reasoning (Fra.): 82.13 Reasoning (Geo.): 81.87 Reasoning (Mea.): 97.98 Reasoning (Pat.): 68.75 Reasoning (Pro.): 95.73 Reasoning (Sce.): 62.39 Reasoning (Sen.): 92.49 Reasoning (Spa.): 55.62 Reasoning (Tim.): 77.98 Sub-tasks (Blank): 83.62 Sub-tasks (Img.): 82.66 Sub-tasks (Txt.): 75.19 |
| visual-question-answering-on-iconqa | ViT | Reasoning (Alg.): 51.10 Reasoning (Com.): 82.12 Reasoning (Cou.): 70.84 Reasoning (Est.): 98.95 Reasoning (Fra.): 77.41 Reasoning (Geo.): 82.60 Reasoning (Mea.): 98.76 Reasoning (Pat.): 58.46 Reasoning (Pro.): 86.07 Reasoning (Sce.): 68.80 Reasoning (Sen.): 84.72 Reasoning (Spa.): 54.64 Reasoning (Tim.): 68.66 Sub-tasks (Blank): 78.92 Sub-tasks (Img.): 79.15 Sub-tasks (Txt.): 72.34 |
| visual-question-answering-on-iconqa | Top-Down | Reasoning (Alg.): 50.00 Reasoning (Com.): 80.65 Reasoning (Cou.): 65.01 Reasoning (Est.): 99.54 Reasoning (Fra.): 72.43 Reasoning (Geo.): 80.07 Reasoning (Mea.): 99.46 Reasoning (Pat.): 55.01 Reasoning (Pro.): 83.75 Reasoning (Sce.): 58.22 Reasoning (Sen.): 84.54 Reasoning (Spa.): 45.78 Reasoning (Tim.): 68.28 Sub-tasks (Blank): 73.03 Sub-tasks (Img.): 75.92 Sub-tasks (Txt.): 68.51 |