HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Nandan Thakur Nils Reimers Andreas Rücklé Abhishek Srivastava Iryna Gurevych

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Abstract

Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. We hope this framework allows us to better evaluate and understand existing retrieval systems, and contributes to accelerating progress towards better robust and generalizable systems in the future. BEIR is publicly available at https://github.com/UKPLab/beir.

Code Repositories

beir-cellar/beir
pytorch
Mentioned in GitHub
UKPLab/beir
Official
tf
Mentioned in GitHub
osu-nlp-group/hipporag
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
passage-retrieval-on-msmarco-beirColBERT
nDCG@10: 0.401
passage-retrieval-on-msmarco-beirBM25
nDCG@10: 0.228
passage-retrieval-on-msmarco-beirSPARTA
nDCG@10: 0.351
passage-retrieval-on-msmarco-beirBM25+CE
nDCG@10: 0.413
passage-retrieval-on-msmarco-beirdocT5query
nDCG@10: 0.338
passage-retrieval-on-msmarco-beirANCE
nDCG@10: 0.388
passage-retrieval-on-msmarco-beirDeepCT
nDCG@10: 0.296
passage-retrieval-on-msmarco-beirTAS-b
nDCG@10: 0.408
passage-retrieval-on-msmarco-beirDPR
nDCG@10: 0.177
question-answering-on-fiqa-2018-beirBM25+CE
nDCG@10: 0.347
question-answering-on-hotpotqa-beirBM25+CE
nDCG@10: 0.707
question-answering-on-nq-beirColBERT
nDCG@10: 0.524
question-answering-on-nq-beirBM25+CE
nDCG@10: 0.533

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models | Papers | HyperAI