HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Hu Hexiang ; Luan Yi ; Chen Yang ; Khandelwal Urvashi ; Joshi Mandar ; Lee Kenton ; Toutanova Kristina ; Chang Ming-Wei

Open-domain Visual Entity Recognition: Towards Recognizing Millions of
  Wikipedia Entities

Abstract

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibitstrong generalization on various visual domains and tasks. However, existingimage classification benchmarks often evaluate recognition on a specific domain(e.g., outdoor images) or a specific task (e.g., classifying plant species),which falls short of evaluating whether pre-trained foundational models areuniversal visual recognizers. To address this, we formally present the task ofOpen-domain Visual Entity recognitioN (OVEN), where a model need to link animage onto a Wikipedia entity with respect to a text query. We constructOVEN-Wiki by re-purposing 14 existing datasets with all labels grounded ontoone single label space: Wikipedia entities. OVEN challenges models to selectamong six million possible Wikipedia entities, making it a general visualrecognition benchmark with the largest number of labels. Our study onstate-of-the-art pre-trained models reveals large headroom in generalizing tothe massive-scale label space. We show that a PaLI-based auto-regressive visualrecognition model performs surprisingly well, even on Wikipedia entities thathave never been seen during fine-tuning. We also find existing pretrainedmodels yield different strengths: while PaLI-based models obtain higher overallperformance, CLIP-based models are better at recognizing tail entities.

Code Repositories

open-vision-language/oven
Mentioned in GitHub
edchengg/oven_eval
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
fine-grained-image-recognition-on-ovenPaLI (17B)
Accuracy: 20.2
fine-grained-image-recognition-on-ovenCLIP2CLIP
Accuracy: 5.3
fine-grained-image-recognition-on-ovenPaLI (3B)
Accuracy: 11.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities | Papers | HyperAI