HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Retrieval Augmented Visual Question Answering with Outside Knowledge

Weizhe Lin Bill Byrne

Retrieval Augmented Visual Question Answering with Outside Knowledge

Abstract

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation, introducing a potential limit on the overall system performance. Instead, we propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion. Our experiments show that our scheme outperforms recent OK-VQA systems with strong DPR for retrieval. We also introduce new diagnostic metrics to analyze how retrieval and generation interact. The strong retrieval ability of our model significantly reduces the number of retrieved documents needed in training, yielding significant benefits in answer quality and computation required for training.

Benchmarks

BenchmarkMethodologyMetrics
retrieval-on-ok-vqaRA-VQA
Recall@5: 82.84
visual-question-answering-on-ok-vqaRA-VQA-FrDPR (T5-large)
Accuracy: 51.22
Exact Match (EM): 55.77
Recall@5: 81.25
visual-question-answering-on-ok-vqaRA-VQA (T5-large)
Accuracy: 54.48
Exact Match (EM): 59.41
Recall@5: 82.84

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Retrieval Augmented Visual Question Answering with Outside Knowledge | Papers | HyperAI