HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Who's Waldo? Linking People Across Text and Images

Claire Yuqing Cui; Apoorv Khandelwal; Yoav Artzi; Noah Snavely; Hadar Averbuch-Elor

Who's Waldo? Linking People Across Text and Images

Abstract

We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image. In contrast to prior work in visual grounding, which is predominantly object-based, our new task masks out the names of people in captions in order to encourage methods trained on such image-caption pairs to focus on contextual cues (such as rich interactions between multiple people), rather than learning associations between names and appearances. To facilitate this task, we introduce a new dataset, Who's Waldo, mined automatically from image-caption data on Wikimedia Commons. We propose a Transformer-based method that outperforms several strong baselines on this task, and are releasing our data to the research community to spur work on contextual models that consider both vision and language.

Code Repositories

clairecyq/whos-waldo
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
person-centric-visual-grounding-on-whos-waldoWho's Waldo
Accuracy: 63.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Who's Waldo? Linking People Across Text and Images | Papers | HyperAI