HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Entity Linking in 100 Languages

Jan A. Botha Zifei Shan Daniel Gillick

Entity Linking in 100 Languages

Abstract

We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero- and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset (http://goo.gle/mewsli-dataset) matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.

Code Repositories

hazyresearch/tabi
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
entity-disambiguation-on-mewsli-9Model F+
Micro Precision: 89.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Entity Linking in 100 Languages | Papers | HyperAI