HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Deep Modular Co-Attention Networks for Visual Question Answering

Zhou Yu; Jun Yu; Yuhao Cui; Dacheng Tao; Qi Tian

Deep Modular Co-Attention Networks for Visual Question Answering

Abstract

Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN's effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63$\%$ overall accuracy on the test-dev set. Code is available at https://github.com/MILVLG/mcan-vqa.

Code Repositories

apugoneappu/ask_me_anything
pytorch
Mentioned in GitHub
MILVLG/mcan-vqa
Official
pytorch
apugoneappu/vqa_visualise
pytorch
Mentioned in GitHub
vikrantmane7781/detectroon2
pytorch
Mentioned in GitHub
hieunghia-pat/UIT-MCAN
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-sqa3dMCAN
AnswerExactMatch (Question Answering): 43.42
visual-question-answering-on-vqa-v2-test-devMCANed-6
Accuracy: 70.63
visual-question-answering-on-vqa-v2-test-stdMCANed-6
overall: 70.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep Modular Co-Attention Networks for Visual Question Answering | Papers | HyperAI