HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Modeling Relationships in Referential Expressions with Compositional Modular Networks

Ronghang Hu; Marcus Rohrbach; Jacob Andreas; Trevor Darrell; Kate Saenko

Modeling Relationships in Referential Expressions with Compositional Modular Networks

Abstract

People often refer to entities in an image in terms of their relationships with other entities. For example, "the black cat sitting under the table" refers to both a "black cat" entity and its relationship with another "table" entity. Understanding these relationships is essential for interpreting and grounding such natural language expressions. Most prior work focuses on either grounding entire referential expressions holistically to one region, or localizing relationships based on a fixed set of categories. In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene. We call this approach Compositional Modular Networks (CMNs): a novel architecture that learns linguistic analysis and visual inference end-to-end. Our approach is built around two types of neural modules that inspect local regions and pairwise interactions between regions. We evaluate CMNs on multiple referential expression datasets, outperforming state-of-the-art approaches on all tasks.

Code Repositories

thilinicooray/Bottom-up-vqa
pytorch
Mentioned in GitHub
hengyuan-hu/bottom-up-attention-vqa
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-visual-genomeCMN
Percentage correct: 44.24
visual-question-answering-on-visual-genome-1CMN
Percentage correct: 28.52
visual-question-answering-on-visual7wCMN
Percentage correct: 72.53

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Modeling Relationships in Referential Expressions with Compositional Modular Networks | Papers | HyperAI