3 个月前

MTEB:大规模文本嵌入基准

MTEB:大规模文本嵌入基准

摘要

文本嵌入(text embeddings)通常仅在单一任务的少量数据集上进行评估,而这些数据集无法涵盖其在其他任务中的潜在应用。目前尚不清楚在语义文本相似性(STS)任务上表现最先进的嵌入方法,是否同样适用于聚类或重排序等其他任务。这种评估局限性使得该领域的进展难以追踪,因为各类模型不断被提出,却缺乏系统性的评估验证。为解决这一问题,我们提出了大规模文本嵌入基准测试(Massive Text Embedding Benchmark, MTEB)。MTEB涵盖8类文本嵌入任务,覆盖总计58个数据集和112种语言。通过对33种模型在MTEB上的全面评测,我们建立了迄今为止最全面的文本嵌入基准体系。实验结果表明,没有任何一种文本嵌入方法能在所有任务上全面领先。这表明该领域尚未形成统一的通用文本嵌入方法,也尚未充分扩展和优化,以在所有嵌入任务上均达到最先进水平。MTEB提供开源代码及公开排行榜,访问地址为:https://github.com/embeddings-benchmark/mteb。

代码仓库

climsocana/tecb-de
GitHub 中提及
embeddings-benchmark/mteb
官方
pytorch
GitHub 中提及
basf/chemteb
pytorch
GitHub 中提及
lyon-nlp/mteb-french
pytorch
GitHub 中提及

基准测试

基准方法指标
information-retrieval-on-mtebSGPT-5.8B-msmarco
nDCG@10: 50.25
semantic-textual-similarity-on-mtebAda Similarity
Spearman Correlation: 78.6
semantic-textual-similarity-on-mtebGTR-Large
Spearman Correlation: 78.19
semantic-textual-similarity-on-mtebSGPT-2.7B-msmarco
Spearman Correlation: 76.83
semantic-textual-similarity-on-mtebGTR-Base
Spearman Correlation: 77.07
semantic-textual-similarity-on-mtebST5-XXL
Spearman Correlation: 82.63
semantic-textual-similarity-on-mtebSimCSE-BERT-unsup
Spearman Correlation: 74.33
semantic-textual-similarity-on-mtebKomninos
Spearman Correlation: 62.47
semantic-textual-similarity-on-mtebSGPT-5.8B-nli
Spearman Correlation: 80.53
semantic-textual-similarity-on-mtebSGPT-5.8B-msmarco
Spearman Correlation: 78.1
semantic-textual-similarity-on-mtebSPECTER
Spearman Correlation: 61.02
semantic-textual-similarity-on-mtebGTR-XXL
Spearman Correlation: 78.38
semantic-textual-similarity-on-mtebMiniLM-L6
Spearman Correlation: 78.92
semantic-textual-similarity-on-mtebSimCSE-BERT-sup
Spearman Correlation: 79.12
semantic-textual-similarity-on-mtebLASER2
Spearman Correlation: 55.32
semantic-textual-similarity-on-mtebcoCondenser-msmarco
Spearman Correlation: 76.47
semantic-textual-similarity-on-mtebGTR-XL
Spearman Correlation: 77.8
semantic-textual-similarity-on-mtebST5-Large
Spearman Correlation: 81.83
semantic-textual-similarity-on-mtebLaBSE
Spearman Correlation: 70.8
semantic-textual-similarity-on-mtebMPNet
Spearman Correlation: 80.28
semantic-textual-similarity-on-mtebBERT
Spearman Correlation: 54.36
semantic-textual-similarity-on-mtebSGPT-1.3B-msmarco
Spearman Correlation: 75.74
semantic-textual-similarity-on-mtebSGPT-125M-msmarco
Spearman Correlation: 73.41
semantic-textual-similarity-on-mtebST5-XL
Spearman Correlation: 81.66
semantic-textual-similarity-on-mtebMiniLM-L12
Spearman Correlation: 79.8
semantic-textual-similarity-on-mtebST5-Base
Spearman Correlation: 81.14
semantic-textual-similarity-on-mtebMPNet-multilingual
Spearman Correlation: 80.73
semantic-textual-similarity-on-mtebGlove
Spearman Correlation: 61.85
semantic-textual-similarity-on-mtebSGPT-BLOOM-7.1B-msmarco
Spearman Correlation: 77.74
semantic-textual-similarity-on-mtebSGPT-125M-nli
Spearman Correlation: 74.71
text-classification-on-mtebGTR-Large
Accuracy: 67.14
text-classification-on-mtebST5-XL
Accuracy: 72.84
text-classification-on-mtebST5-XXL
Accuracy: 73.42
text-classification-on-mtebLaBSE
Accuracy: 62.71
text-classification-on-mtebSGPT-125M-nli
Accuracy: 61.46
text-classification-on-mtebSGPT-5.8B-nli
Accuracy: 70.14
text-classification-on-mtebAda Similarity
Accuracy: 70.44
text-classification-on-mtebMiniLM-L6
Accuracy: 63.06
text-classification-on-mtebcoCondenser-msmarco
Accuracy: 64.71
text-classification-on-mtebST5-Base
Accuracy: 69.81
text-classification-on-mtebSGPT-BLOOM-7.1B-msmarco
Accuracy: 66.19
text-classification-on-mtebMPNet-multilingual
Accuracy: 67.91
text-classification-on-mtebSPECTER
Accuracy: 52.37
text-classification-on-mtebGTR-XXL
Accuracy: 67.41
text-classification-on-mtebMPNet
Accuracy: 65.07
text-classification-on-mtebKomninos
Accuracy: 57.65
text-classification-on-mtebSimCSE-BERT-unsup
Accuracy: 62.5
text-classification-on-mtebBERT
Accuracy: 61.66
text-classification-on-mtebMiniLM-L12-multilingual
Accuracy: 64.3
text-classification-on-mtebLASER2
Accuracy: 53.65
text-classification-on-mtebGTR-XL
Accuracy: 67.11
text-classification-on-mtebSGPT-125M-msmarco
Accuracy: 60.72
text-classification-on-mtebContriever
Accuracy: 66.68
text-classification-on-mtebSimCSE-BERT-sup
Accuracy: 67.32
text-classification-on-mtebST5-Large
Accuracy: 72.31
text-classification-on-mtebMiniLM-L12
Accuracy: 63.21
text-classification-on-mtebGlove
Accuracy: 57.29
text-classification-on-mtebSGPT-1.3B-msmarco
Accuracy: 66.52
text-classification-on-mtebGTR-Base
Accuracy: 65.25
text-classification-on-mtebSGPT-2.7B-msmarco
Accuracy: 67.13
text-classification-on-mtebSGPT-5.8B-msmarco
Accuracy: 68.13
text-clustering-on-mtebSPECTER
V-Measure: 34.06
text-clustering-on-mtebcoCondenser-msmarco
V-Measure: 37.64
text-clustering-on-mtebST5-XL
V-Measure: 42.34
text-clustering-on-mtebSGPT-1.3B-msmarco
V-Measure: 39.92
text-clustering-on-mtebGTR-XL
V-Measure: 41.51
text-clustering-on-mtebST5-Base
V-Measure: 40.21
text-clustering-on-mtebSGPT-125M-msmarco
V-Measure: 35.79
text-clustering-on-mtebMPNet-multilingual
V-Measure: 38.4
text-clustering-on-mtebSGPT-125M-nli
V-Measure: 30.95
text-clustering-on-mtebKomninos
V-Measure: 26.57
text-clustering-on-mtebSGPT-2.7B-msmarco
V-Measure: 39.83
text-clustering-on-mtebSGPT-BLOOM-7.1B-msmarco
V-Measure: 38.93
text-clustering-on-mtebLASER2
V-Measure: 15.28
text-clustering-on-mtebST5-Large
V-Measure: 41.65
text-clustering-on-mtebSimCSE-BERT-unsup
V-Measure: 29.04
text-clustering-on-mtebSGPT-5.8B-nli
V-Measure: 36.98
text-clustering-on-mtebGlove
V-Measure: 27.73
text-clustering-on-mtebMiniLM-L12
V-Measure: 41.81
text-clustering-on-mtebLaBSE
V-Measure: 29.55
text-clustering-on-mtebMiniLM-L6
V-Measure: 42.35
text-clustering-on-mtebBERT
V-Measure: 30.12
text-clustering-on-mtebSGPT-5.8B-msmarco
V-Measure: 40.35
text-clustering-on-mtebMiniLM-L12-multilingual
V-Measure: 37.14
text-clustering-on-mtebMPNet
V-Measure: 43.69
text-clustering-on-mtebST5-XXL
V-Measure: 43.71
text-clustering-on-mtebSimCSE-BERT-sup
V-Measure: 33.43
text-clustering-on-mtebAda Similarity
V-Measure: 37.52
text-clustering-on-mtebGTR-Base
V-Measure: 38.63
text-clustering-on-mtebContriever
V-Measure: 41.1
text-clustering-on-mtebGTR-XXL
V-Measure: 42.42
text-clustering-on-mtebGTR-Large
V-Measure: 41.6
text-retrieval-on-mtebBERT
nDCG@10: 10.59
text-retrieval-on-mtebST5-XL
nDCG@10: 38.47
text-retrieval-on-mtebMPNet-multilingual
nDCG@10: 35.34
text-retrieval-on-mtebSPECTER
nDCG@10: 15.88
text-retrieval-on-mtebMiniLM-L12
nDCG@10: 42.69
text-retrieval-on-mtebGTR-Large
nDCG@10: 47.42
text-retrieval-on-mtebcoCondenser-msmarco
nDCG@10: 32.96
text-retrieval-on-mtebST5-Large
nDCG@10: 36.71
text-retrieval-on-mtebGlove
nDCG@10: 21.62
text-retrieval-on-mtebLaBSE
nDCG@10: 18.99
text-retrieval-on-mtebMiniLM-L6
nDCG@10: 41.95
text-retrieval-on-mtebSGPT-5.8B-nli
nDCG@10: 32.34
text-retrieval-on-mtebMPNet
nDCG@10: 43.81
text-retrieval-on-mtebGTR-Base
nDCG@10: 44.67
text-retrieval-on-mtebGTR-XXL
nDCG@10: 48.48
text-retrieval-on-mtebLASER2
nDCG@10: 7.93
text-retrieval-on-mtebST5-XXL
nDCG@10: 42.24
text-retrieval-on-mtebSGPT-BLOOM-7.1B-msmarco
nDCG@10: 48.21
text-retrieval-on-mtebSGPT-1.3B-msmarco
nDCG@10: 44.49
text-retrieval-on-mtebGTR-XL
nDCG@10: 47.96
text-retrieval-on-mtebSimCSE-BERT-sup
nDCG@10: 21.82
text-retrieval-on-mtebSimCSE-BERT-unsup
nDCG@10: 20.29
text-retrieval-on-mtebKomninos
nDCG@10: 21.22
text-retrieval-on-mtebSGPT-2.7B-msmarco
nDCG@10: 46.54
text-retrieval-on-mtebSGPT-125M-msmarco
nDCG@10: 37.04
text-retrieval-on-mtebSGPT-125M-nli
nDCG@10: 20.9
text-retrieval-on-mtebSGPT-5.8B-msmarco
nDCG@10: 50.25
text-retrieval-on-mtebMiniLM-L12-multilingual
nDCG@10: 32.45
text-retrieval-on-mtebST5-Base
nDCG@10: 33.63
text-retrieval-on-mtebContriever
nDCG@10: 41.88
text-summarization-on-mtebLASER2
Spearman Correlation: 26.8
text-summarization-on-mtebContriever
Spearman Correlation: 30.36
text-summarization-on-mtebGTR-XL
Spearman Correlation: 30.21
text-summarization-on-mtebST5-Large
Spearman Correlation: 29.64
text-summarization-on-mtebST5-Base
Spearman Correlation: 31.39
text-summarization-on-mtebGlove
Spearman Correlation: 28.87
text-summarization-on-mtebMPNet-multilingual
Spearman Correlation: 31.57
text-summarization-on-mtebKomninos
Spearman Correlation: 30.49
text-summarization-on-mtebSGPT-BLOOM-7.1B-msmarco
Spearman Correlation: 24.99
text-summarization-on-mtebGTR-Base
Spearman Correlation: 29.67
text-summarization-on-mtebMiniLM-L6
Spearman Correlation: 30.81
text-summarization-on-mtebSimCSE-BERT-unsup
Spearman Correlation: 31.15
text-summarization-on-mtebSimCSE-BERT-sup
Spearman Correlation: 23.31
text-summarization-on-mtebGTR-XXL
Spearman Correlation: 30.64
text-summarization-on-mtebMiniLM-L12
Spearman Correlation: 27.9
text-summarization-on-mtebcoCondenser-msmarco
Spearman Correlation: 29.5
text-summarization-on-mtebSGPT-5.8B-msmarco
Spearman Correlation: 24.75
text-summarization-on-mtebSGPT-125M-nli
Spearman Correlation: 30.26
text-summarization-on-mtebMPNet
Spearman Correlation: 27.49
text-summarization-on-mtebST5-XL
Spearman Correlation: 29.91
text-summarization-on-mtebAda Similarity
Spearman Correlation: 26.94
text-summarization-on-mtebSGPT-1.3B-msmarco
Spearman Correlation: 25.44
text-summarization-on-mtebMiniLM-L12-multilingual
Spearman Correlation: 30.67
text-summarization-on-mtebBERT
Spearman Correlation: 29.82
text-summarization-on-mtebST5-XXL
Spearman Correlation: 30.08
text-summarization-on-mtebSPECTER
Spearman Correlation: 27.66

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
MTEB:大规模文本嵌入基准 | 论文 | HyperAI超神经