7 months ago

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind

Abstract

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Visual Question Answering

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Visual Question Answering

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind2 more

Abstract

Build AI with AI

HyperAI Newsletters

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind

Kexin Yi∗ Harvard University Jiajun Wu∗ MIT CSAIL Chuang Gan MIT-IBM Watson AI Lab Antonio Torralba MIT CSAIL Pushmeet Kohli DeepMind