HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Towards VQA Models That Can Read

Amanpreet Singh; Vivek Natarajan; Meet Shah; Yu Jiang; Xinlei Chen; Dhruv Batra; Devi Parikh; Marcus Rohrbach

Towards VQA Models That Can Read

Abstract

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new "TextVQA" dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0.

Code Repositories

facebookresearch/pythia
Official
pytorch
Mentioned in GitHub
ronghanghu/pythia
pytorch
Mentioned in GitHub
allenai/pythia
pytorch
Mentioned in GitHub
jackroos/pythia
pytorch
Mentioned in GitHub
facebookresearch/mmf
pytorch
Mentioned in GitHub
zwxalgorithm/pythia
pytorch
Mentioned in GitHub
ZephyrZhuQi/ssbaseline
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-vizwiz-2018-1Pythia v0.3
overall: 54.72
visual-question-answering-on-vqa-v2-test-devPythia v0.3 + LoRRA
Accuracy: 69.21

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Towards VQA Models That Can Read | Papers | HyperAI