HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Xing Han Lù

BM25S: Orders of magnitude faster lexical search via eager sparse
  scoring

Abstract

We introduce BM25S, an efficient Python-based implementation of BM25 thatonly depends on Numpy and Scipy. BM25S achieves up to a 500x speedup comparedto the most popular Python-based framework by eagerly computing BM25 scoresduring indexing and storing them into sparse matrices. It also achievesconsiderable speedups compared to highly optimized Java-based implementations,which are used by popular commercial products. Finally, BM25S reproduces theexact implementation of five BM25 variants based on Kamphuis et al. (2020) byextending eager scoring to non-sparse variants using a novel score shiftingmethod. The code can be found at https://github.com/xhluca/bm25s

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
retrieval-on-hotpotqaBM25S
Queries per second: 20.88
retrieval-on-hotpotqaElasticsearch
Queries per second: 7.11
retrieval-on-hotpotqaRank-BM25
Queries per second: 0.04
retrieval-on-natural-questionsElasticsearch
Queries per second: 12.16
retrieval-on-natural-questionsRank-BM25
Queries per second: 0.10
retrieval-on-natural-questionsBM25S
Queries per second: 41.85
retrieval-on-quora-question-pairsElasticsearch
Queries per second: 21.8
retrieval-on-quora-question-pairsBM25-PT
Queries per second: 6.49
retrieval-on-quora-question-pairsRank-BM25
Queries per second: 1.18
retrieval-on-quora-question-pairsBM25S
Queries per second: 183.53
text-retrieval-on-climate-feverLucene (BM25S)
nDCG@10: 16.2
text-retrieval-on-dbpediaLucene (BM25S)
nDCG@10: 31.9
text-retrieval-on-feverLucene (BM25S)
nDCG@10: 63.8
text-retrieval-on-hotpotqaLucene (BM25S)
nDCG@10: 62.9
text-retrieval-on-ms-marcoLucene (BM25S)
NDCG@10: 22.8
text-retrieval-on-natural-questionsLucene (BM25S)
NDCG@10: 30.5
text-retrieval-on-nfcorpusLucene (BM25S)
nDCG@10: 31.8
text-retrieval-on-quora-question-pairsLucene (BM25S)
nDCG@10: 78.7
text-retrieval-on-scidocsLucene (BM25S)
nDCG@10: 67.6
text-retrieval-on-scifactLucene (BM25S)
nDCG@10: 15
text-retrieval-on-trec-covidLucene (BM25S)
nDCG@10: 58.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BM25S: Orders of magnitude faster lexical search via eager sparse scoring | Papers | HyperAI