a month ago

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Li Qing Tao Qingyi Joty Shafiq Cai Jianfei Luo Jiebo

Abstract

Most existing works in visual question answering (VQA) are dedicated toimproving the accuracy of predicted answers, while disregarding theexplanations. We argue that the explanation for an answer is of the same oreven more importance compared with the answer itself, since it makes thequestion and answering process more understandable and traceable. To this end,we propose a new task of VQA-E (VQA with Explanation), where the computationalmodels are required to generate an explanation with the predicted answer. Wefirst construct a new dataset, and then frame the VQA-E problem in a multi-tasklearning architecture. Our VQA-E dataset is automatically derived from the VQAv2 dataset by intelligently exploiting the available captions. We haveconducted a user study to validate the quality of explanations synthesized byour method. We quantitatively show that the additional supervision fromexplanations can not only produce insightful textual sentences to justify theanswers, but also improve the performance of answer prediction. Our modeloutperforms the state-of-the-art methods by a clear margin on the VQA v2dataset.

Benchmarks

Benchmark	Methodology	Metrics
explanatory-visual-question-answering-on-gqa	VQAE	BLEU-4: 42.56 CIDEr: 358.20 GQA-test: 57.24 GQA-val: 65.19 Grounding: 31.29 METEOR: 34.51 ROUGE-L: 73.59 SPICE: 40.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning