Command Palette
Search for a command to run...
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh

Abstract
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| visual-question-answering-on-coco-visual | DLAIT | Percentage correct: 68.07 |
| visual-question-answering-on-coco-visual | HDU-USYD-UNCC | Percentage correct: 68.16 |
| visual-question-answering-on-coco-visual-1 | LSTM Q+I | Percentage correct: 63.1 |
| visual-question-answering-on-coco-visual-2 | LSTM + global features | Percentage correct: 65.02 |
| visual-question-answering-on-coco-visual-2 | Dualnet ensemble | Percentage correct: 69.73 |
| visual-question-answering-on-coco-visual-2 | LSTM blind | Percentage correct: 57.19 |
| visual-question-answering-on-coco-visual-3 | Dualnet ensemble | Percentage correct: 71.18 |
| visual-question-answering-on-coco-visual-3 | LSTM + global features | Percentage correct: 69.21 |
| visual-question-answering-on-coco-visual-3 | LSTM blind | Percentage correct: 61.41 |
| visual-question-answering-on-coco-visual-4 | LSTM Q+I | Percentage correct: 58.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.