HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Simple Baseline for Knowledge-Based Visual Question Answering

Alexandros Xenos Themos Stafylakis Ioannis Patras Georgios Tzimiropoulos

A Simple Baseline for Knowledge-Based Visual Question Answering

Abstract

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heavily rely on accessing GPT-3 API. Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and yet achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets. Finally, we perform several ablation studies to understand important aspects of our method. Our code is publicly available at https://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-a-okvqaA Simple Baseline for KB-VQA
DA VQA Score: 57.5
visual-question-answering-on-ok-vqaA Simple Baseline for KB-VQA
Accuracy: 61.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Simple Baseline for Knowledge-Based Visual Question Answering | Papers | HyperAI