HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

Tiong Anthony Meng Huat ; Li Junnan ; Li Boyang ; Savarese Silvio ; Hoi Steven C. H.

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models
  with Zero Training

Abstract

Visual question answering (VQA) is a hallmark of vision and languagereasoning and a challenging task under the zero-shot setting. We proposePlug-and-Play VQA (PNP-VQA), a modular framework for zero-shot VQA. In contrastto most existing works, which require substantial adaptation of pretrainedlanguage models (PLMs) for the vision modality, PNP-VQA requires no additionaltraining of the PLMs. Instead, we propose to use natural language and networkinterpretation as an intermediate representation that glues pretrained modelstogether. We first generate question-guided informative image captions, andpass the captions to a PLM as context for question answering. Surpassingend-to-end trained baselines, PNP-VQA achieves state-of-the-art results onzero-shot VQAv2 and GQA. With 11B parameters, it outperforms the 80B-parameterFlamingo model by 8.5% on VQAv2. With 738M PLM parameters, PNP-VQA achieves animprovement of 9.1% on GQA over FewVLM with 740M PLM parameters. Code isreleased at https://github.com/salesforce/LAVIS/tree/main/projects/pnp-vqa

Code Repositories

salesforce/lavis
Official
pytorch
Mentioned in GitHub
abril4416/kgen_vqa
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-gqa-test-devPNP-VQA
Accuracy: 41.9
visual-question-answering-on-ok-vqaPNP-VQA
Accuracy: 35.9
visual-question-answering-on-vqa-v2-test-devPNP-VQA
Accuracy: 64.8
visual-question-answering-on-vqa-v2-valPNP-VQA
Accuracy: 63.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training | Papers | HyperAI