Command Palette
Search for a command to run...
BM25S: Orders of magnitude faster lexical search via eager sparse scoring
Xing Han Lù

Abstract
We introduce BM25S, an efficient Python-based implementation of BM25 thatonly depends on Numpy and Scipy. BM25S achieves up to a 500x speedup comparedto the most popular Python-based framework by eagerly computing BM25 scoresduring indexing and storing them into sparse matrices. It also achievesconsiderable speedups compared to highly optimized Java-based implementations,which are used by popular commercial products. Finally, BM25S reproduces theexact implementation of five BM25 variants based on Kamphuis et al. (2020) byextending eager scoring to non-sparse variants using a novel score shiftingmethod. The code can be found at https://github.com/xhluca/bm25s
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| retrieval-on-hotpotqa | BM25S | Queries per second: 20.88 |
| retrieval-on-hotpotqa | Elasticsearch | Queries per second: 7.11 |
| retrieval-on-hotpotqa | Rank-BM25 | Queries per second: 0.04 |
| retrieval-on-natural-questions | Elasticsearch | Queries per second: 12.16 |
| retrieval-on-natural-questions | Rank-BM25 | Queries per second: 0.10 |
| retrieval-on-natural-questions | BM25S | Queries per second: 41.85 |
| retrieval-on-quora-question-pairs | Elasticsearch | Queries per second: 21.8 |
| retrieval-on-quora-question-pairs | BM25-PT | Queries per second: 6.49 |
| retrieval-on-quora-question-pairs | Rank-BM25 | Queries per second: 1.18 |
| retrieval-on-quora-question-pairs | BM25S | Queries per second: 183.53 |
| text-retrieval-on-climate-fever | Lucene (BM25S) | nDCG@10: 16.2 |
| text-retrieval-on-dbpedia | Lucene (BM25S) | nDCG@10: 31.9 |
| text-retrieval-on-fever | Lucene (BM25S) | nDCG@10: 63.8 |
| text-retrieval-on-hotpotqa | Lucene (BM25S) | nDCG@10: 62.9 |
| text-retrieval-on-ms-marco | Lucene (BM25S) | NDCG@10: 22.8 |
| text-retrieval-on-natural-questions | Lucene (BM25S) | NDCG@10: 30.5 |
| text-retrieval-on-nfcorpus | Lucene (BM25S) | nDCG@10: 31.8 |
| text-retrieval-on-quora-question-pairs | Lucene (BM25S) | nDCG@10: 78.7 |
| text-retrieval-on-scidocs | Lucene (BM25S) | nDCG@10: 67.6 |
| text-retrieval-on-scifact | Lucene (BM25S) | nDCG@10: 15 |
| text-retrieval-on-trec-covid | Lucene (BM25S) | nDCG@10: 58.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.