HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MTEB: Massive Text Embedding Benchmark

Niklas Muennighoff Nouamane Tazi Loïc Magne Nils Reimers

MTEB: Massive Text Embedding Benchmark

Abstract

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks. MTEB comes with open-source code and a public leaderboard at https://github.com/embeddings-benchmark/mteb.

Code Repositories

climsocana/tecb-de
Mentioned in GitHub
embeddings-benchmark/mteb
Official
pytorch
Mentioned in GitHub
basf/chemteb
pytorch
Mentioned in GitHub
lyon-nlp/mteb-french
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
information-retrieval-on-mtebSGPT-5.8B-msmarco
nDCG@10: 50.25
semantic-textual-similarity-on-mtebAda Similarity
Spearman Correlation: 78.6
semantic-textual-similarity-on-mtebGTR-Large
Spearman Correlation: 78.19
semantic-textual-similarity-on-mtebSGPT-2.7B-msmarco
Spearman Correlation: 76.83
semantic-textual-similarity-on-mtebGTR-Base
Spearman Correlation: 77.07
semantic-textual-similarity-on-mtebST5-XXL
Spearman Correlation: 82.63
semantic-textual-similarity-on-mtebSimCSE-BERT-unsup
Spearman Correlation: 74.33
semantic-textual-similarity-on-mtebKomninos
Spearman Correlation: 62.47
semantic-textual-similarity-on-mtebSGPT-5.8B-nli
Spearman Correlation: 80.53
semantic-textual-similarity-on-mtebSGPT-5.8B-msmarco
Spearman Correlation: 78.1
semantic-textual-similarity-on-mtebSPECTER
Spearman Correlation: 61.02
semantic-textual-similarity-on-mtebGTR-XXL
Spearman Correlation: 78.38
semantic-textual-similarity-on-mtebMiniLM-L6
Spearman Correlation: 78.92
semantic-textual-similarity-on-mtebSimCSE-BERT-sup
Spearman Correlation: 79.12
semantic-textual-similarity-on-mtebLASER2
Spearman Correlation: 55.32
semantic-textual-similarity-on-mtebcoCondenser-msmarco
Spearman Correlation: 76.47
semantic-textual-similarity-on-mtebGTR-XL
Spearman Correlation: 77.8
semantic-textual-similarity-on-mtebST5-Large
Spearman Correlation: 81.83
semantic-textual-similarity-on-mtebLaBSE
Spearman Correlation: 70.8
semantic-textual-similarity-on-mtebMPNet
Spearman Correlation: 80.28
semantic-textual-similarity-on-mtebBERT
Spearman Correlation: 54.36
semantic-textual-similarity-on-mtebSGPT-1.3B-msmarco
Spearman Correlation: 75.74
semantic-textual-similarity-on-mtebSGPT-125M-msmarco
Spearman Correlation: 73.41
semantic-textual-similarity-on-mtebST5-XL
Spearman Correlation: 81.66
semantic-textual-similarity-on-mtebMiniLM-L12
Spearman Correlation: 79.8
semantic-textual-similarity-on-mtebST5-Base
Spearman Correlation: 81.14
semantic-textual-similarity-on-mtebMPNet-multilingual
Spearman Correlation: 80.73
semantic-textual-similarity-on-mtebGlove
Spearman Correlation: 61.85
semantic-textual-similarity-on-mtebSGPT-BLOOM-7.1B-msmarco
Spearman Correlation: 77.74
semantic-textual-similarity-on-mtebSGPT-125M-nli
Spearman Correlation: 74.71
text-classification-on-mtebGTR-Large
Accuracy: 67.14
text-classification-on-mtebST5-XL
Accuracy: 72.84
text-classification-on-mtebST5-XXL
Accuracy: 73.42
text-classification-on-mtebLaBSE
Accuracy: 62.71
text-classification-on-mtebSGPT-125M-nli
Accuracy: 61.46
text-classification-on-mtebSGPT-5.8B-nli
Accuracy: 70.14
text-classification-on-mtebAda Similarity
Accuracy: 70.44
text-classification-on-mtebMiniLM-L6
Accuracy: 63.06
text-classification-on-mtebcoCondenser-msmarco
Accuracy: 64.71
text-classification-on-mtebST5-Base
Accuracy: 69.81
text-classification-on-mtebSGPT-BLOOM-7.1B-msmarco
Accuracy: 66.19
text-classification-on-mtebMPNet-multilingual
Accuracy: 67.91
text-classification-on-mtebSPECTER
Accuracy: 52.37
text-classification-on-mtebGTR-XXL
Accuracy: 67.41
text-classification-on-mtebMPNet
Accuracy: 65.07
text-classification-on-mtebKomninos
Accuracy: 57.65
text-classification-on-mtebSimCSE-BERT-unsup
Accuracy: 62.5
text-classification-on-mtebBERT
Accuracy: 61.66
text-classification-on-mtebMiniLM-L12-multilingual
Accuracy: 64.3
text-classification-on-mtebLASER2
Accuracy: 53.65
text-classification-on-mtebGTR-XL
Accuracy: 67.11
text-classification-on-mtebSGPT-125M-msmarco
Accuracy: 60.72
text-classification-on-mtebContriever
Accuracy: 66.68
text-classification-on-mtebSimCSE-BERT-sup
Accuracy: 67.32
text-classification-on-mtebST5-Large
Accuracy: 72.31
text-classification-on-mtebMiniLM-L12
Accuracy: 63.21
text-classification-on-mtebGlove
Accuracy: 57.29
text-classification-on-mtebSGPT-1.3B-msmarco
Accuracy: 66.52
text-classification-on-mtebGTR-Base
Accuracy: 65.25
text-classification-on-mtebSGPT-2.7B-msmarco
Accuracy: 67.13
text-classification-on-mtebSGPT-5.8B-msmarco
Accuracy: 68.13
text-clustering-on-mtebSPECTER
V-Measure: 34.06
text-clustering-on-mtebcoCondenser-msmarco
V-Measure: 37.64
text-clustering-on-mtebST5-XL
V-Measure: 42.34
text-clustering-on-mtebSGPT-1.3B-msmarco
V-Measure: 39.92
text-clustering-on-mtebGTR-XL
V-Measure: 41.51
text-clustering-on-mtebST5-Base
V-Measure: 40.21
text-clustering-on-mtebSGPT-125M-msmarco
V-Measure: 35.79
text-clustering-on-mtebMPNet-multilingual
V-Measure: 38.4
text-clustering-on-mtebSGPT-125M-nli
V-Measure: 30.95
text-clustering-on-mtebKomninos
V-Measure: 26.57
text-clustering-on-mtebSGPT-2.7B-msmarco
V-Measure: 39.83
text-clustering-on-mtebSGPT-BLOOM-7.1B-msmarco
V-Measure: 38.93
text-clustering-on-mtebLASER2
V-Measure: 15.28
text-clustering-on-mtebST5-Large
V-Measure: 41.65
text-clustering-on-mtebSimCSE-BERT-unsup
V-Measure: 29.04
text-clustering-on-mtebSGPT-5.8B-nli
V-Measure: 36.98
text-clustering-on-mtebGlove
V-Measure: 27.73
text-clustering-on-mtebMiniLM-L12
V-Measure: 41.81
text-clustering-on-mtebLaBSE
V-Measure: 29.55
text-clustering-on-mtebMiniLM-L6
V-Measure: 42.35
text-clustering-on-mtebBERT
V-Measure: 30.12
text-clustering-on-mtebSGPT-5.8B-msmarco
V-Measure: 40.35
text-clustering-on-mtebMiniLM-L12-multilingual
V-Measure: 37.14
text-clustering-on-mtebMPNet
V-Measure: 43.69
text-clustering-on-mtebST5-XXL
V-Measure: 43.71
text-clustering-on-mtebSimCSE-BERT-sup
V-Measure: 33.43
text-clustering-on-mtebAda Similarity
V-Measure: 37.52
text-clustering-on-mtebGTR-Base
V-Measure: 38.63
text-clustering-on-mtebContriever
V-Measure: 41.1
text-clustering-on-mtebGTR-XXL
V-Measure: 42.42
text-clustering-on-mtebGTR-Large
V-Measure: 41.6
text-retrieval-on-mtebBERT
nDCG@10: 10.59
text-retrieval-on-mtebST5-XL
nDCG@10: 38.47
text-retrieval-on-mtebMPNet-multilingual
nDCG@10: 35.34
text-retrieval-on-mtebSPECTER
nDCG@10: 15.88
text-retrieval-on-mtebMiniLM-L12
nDCG@10: 42.69
text-retrieval-on-mtebGTR-Large
nDCG@10: 47.42
text-retrieval-on-mtebcoCondenser-msmarco
nDCG@10: 32.96
text-retrieval-on-mtebST5-Large
nDCG@10: 36.71
text-retrieval-on-mtebGlove
nDCG@10: 21.62
text-retrieval-on-mtebLaBSE
nDCG@10: 18.99
text-retrieval-on-mtebMiniLM-L6
nDCG@10: 41.95
text-retrieval-on-mtebSGPT-5.8B-nli
nDCG@10: 32.34
text-retrieval-on-mtebMPNet
nDCG@10: 43.81
text-retrieval-on-mtebGTR-Base
nDCG@10: 44.67
text-retrieval-on-mtebGTR-XXL
nDCG@10: 48.48
text-retrieval-on-mtebLASER2
nDCG@10: 7.93
text-retrieval-on-mtebST5-XXL
nDCG@10: 42.24
text-retrieval-on-mtebSGPT-BLOOM-7.1B-msmarco
nDCG@10: 48.21
text-retrieval-on-mtebSGPT-1.3B-msmarco
nDCG@10: 44.49
text-retrieval-on-mtebGTR-XL
nDCG@10: 47.96
text-retrieval-on-mtebSimCSE-BERT-sup
nDCG@10: 21.82
text-retrieval-on-mtebSimCSE-BERT-unsup
nDCG@10: 20.29
text-retrieval-on-mtebKomninos
nDCG@10: 21.22
text-retrieval-on-mtebSGPT-2.7B-msmarco
nDCG@10: 46.54
text-retrieval-on-mtebSGPT-125M-msmarco
nDCG@10: 37.04
text-retrieval-on-mtebSGPT-125M-nli
nDCG@10: 20.9
text-retrieval-on-mtebSGPT-5.8B-msmarco
nDCG@10: 50.25
text-retrieval-on-mtebMiniLM-L12-multilingual
nDCG@10: 32.45
text-retrieval-on-mtebST5-Base
nDCG@10: 33.63
text-retrieval-on-mtebContriever
nDCG@10: 41.88
text-summarization-on-mtebLASER2
Spearman Correlation: 26.8
text-summarization-on-mtebContriever
Spearman Correlation: 30.36
text-summarization-on-mtebGTR-XL
Spearman Correlation: 30.21
text-summarization-on-mtebST5-Large
Spearman Correlation: 29.64
text-summarization-on-mtebST5-Base
Spearman Correlation: 31.39
text-summarization-on-mtebGlove
Spearman Correlation: 28.87
text-summarization-on-mtebMPNet-multilingual
Spearman Correlation: 31.57
text-summarization-on-mtebKomninos
Spearman Correlation: 30.49
text-summarization-on-mtebSGPT-BLOOM-7.1B-msmarco
Spearman Correlation: 24.99
text-summarization-on-mtebGTR-Base
Spearman Correlation: 29.67
text-summarization-on-mtebMiniLM-L6
Spearman Correlation: 30.81
text-summarization-on-mtebSimCSE-BERT-unsup
Spearman Correlation: 31.15
text-summarization-on-mtebSimCSE-BERT-sup
Spearman Correlation: 23.31
text-summarization-on-mtebGTR-XXL
Spearman Correlation: 30.64
text-summarization-on-mtebMiniLM-L12
Spearman Correlation: 27.9
text-summarization-on-mtebcoCondenser-msmarco
Spearman Correlation: 29.5
text-summarization-on-mtebSGPT-5.8B-msmarco
Spearman Correlation: 24.75
text-summarization-on-mtebSGPT-125M-nli
Spearman Correlation: 30.26
text-summarization-on-mtebMPNet
Spearman Correlation: 27.49
text-summarization-on-mtebST5-XL
Spearman Correlation: 29.91
text-summarization-on-mtebAda Similarity
Spearman Correlation: 26.94
text-summarization-on-mtebSGPT-1.3B-msmarco
Spearman Correlation: 25.44
text-summarization-on-mtebMiniLM-L12-multilingual
Spearman Correlation: 30.67
text-summarization-on-mtebBERT
Spearman Correlation: 29.82
text-summarization-on-mtebST5-XXL
Spearman Correlation: 30.08
text-summarization-on-mtebSPECTER
Spearman Correlation: 27.66

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp