
摘要
文本嵌入(text embeddings)通常仅在单一任务的少量数据集上进行评估,而这些数据集无法涵盖其在其他任务中的潜在应用。目前尚不清楚在语义文本相似性(STS)任务上表现最先进的嵌入方法,是否同样适用于聚类或重排序等其他任务。这种评估局限性使得该领域的进展难以追踪,因为各类模型不断被提出,却缺乏系统性的评估验证。为解决这一问题,我们提出了大规模文本嵌入基准测试(Massive Text Embedding Benchmark, MTEB)。MTEB涵盖8类文本嵌入任务,覆盖总计58个数据集和112种语言。通过对33种模型在MTEB上的全面评测,我们建立了迄今为止最全面的文本嵌入基准体系。实验结果表明,没有任何一种文本嵌入方法能在所有任务上全面领先。这表明该领域尚未形成统一的通用文本嵌入方法,也尚未充分扩展和优化,以在所有嵌入任务上均达到最先进水平。MTEB提供开源代码及公开排行榜,访问地址为:https://github.com/embeddings-benchmark/mteb。
代码仓库
climsocana/tecb-de
GitHub 中提及
embeddings-benchmark/mteb
官方
pytorch
GitHub 中提及
basf/chemteb
pytorch
GitHub 中提及
wadoodabdul/clinical_ner_benchmark
GitHub 中提及
lyon-nlp/mteb-french
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| information-retrieval-on-mteb | SGPT-5.8B-msmarco | nDCG@10: 50.25 |
| semantic-textual-similarity-on-mteb | Ada Similarity | Spearman Correlation: 78.6 |
| semantic-textual-similarity-on-mteb | GTR-Large | Spearman Correlation: 78.19 |
| semantic-textual-similarity-on-mteb | SGPT-2.7B-msmarco | Spearman Correlation: 76.83 |
| semantic-textual-similarity-on-mteb | GTR-Base | Spearman Correlation: 77.07 |
| semantic-textual-similarity-on-mteb | ST5-XXL | Spearman Correlation: 82.63 |
| semantic-textual-similarity-on-mteb | SimCSE-BERT-unsup | Spearman Correlation: 74.33 |
| semantic-textual-similarity-on-mteb | Komninos | Spearman Correlation: 62.47 |
| semantic-textual-similarity-on-mteb | SGPT-5.8B-nli | Spearman Correlation: 80.53 |
| semantic-textual-similarity-on-mteb | SGPT-5.8B-msmarco | Spearman Correlation: 78.1 |
| semantic-textual-similarity-on-mteb | SPECTER | Spearman Correlation: 61.02 |
| semantic-textual-similarity-on-mteb | GTR-XXL | Spearman Correlation: 78.38 |
| semantic-textual-similarity-on-mteb | MiniLM-L6 | Spearman Correlation: 78.92 |
| semantic-textual-similarity-on-mteb | SimCSE-BERT-sup | Spearman Correlation: 79.12 |
| semantic-textual-similarity-on-mteb | LASER2 | Spearman Correlation: 55.32 |
| semantic-textual-similarity-on-mteb | coCondenser-msmarco | Spearman Correlation: 76.47 |
| semantic-textual-similarity-on-mteb | GTR-XL | Spearman Correlation: 77.8 |
| semantic-textual-similarity-on-mteb | ST5-Large | Spearman Correlation: 81.83 |
| semantic-textual-similarity-on-mteb | LaBSE | Spearman Correlation: 70.8 |
| semantic-textual-similarity-on-mteb | MPNet | Spearman Correlation: 80.28 |
| semantic-textual-similarity-on-mteb | BERT | Spearman Correlation: 54.36 |
| semantic-textual-similarity-on-mteb | SGPT-1.3B-msmarco | Spearman Correlation: 75.74 |
| semantic-textual-similarity-on-mteb | SGPT-125M-msmarco | Spearman Correlation: 73.41 |
| semantic-textual-similarity-on-mteb | ST5-XL | Spearman Correlation: 81.66 |
| semantic-textual-similarity-on-mteb | MiniLM-L12 | Spearman Correlation: 79.8 |
| semantic-textual-similarity-on-mteb | ST5-Base | Spearman Correlation: 81.14 |
| semantic-textual-similarity-on-mteb | MPNet-multilingual | Spearman Correlation: 80.73 |
| semantic-textual-similarity-on-mteb | Glove | Spearman Correlation: 61.85 |
| semantic-textual-similarity-on-mteb | SGPT-BLOOM-7.1B-msmarco | Spearman Correlation: 77.74 |
| semantic-textual-similarity-on-mteb | SGPT-125M-nli | Spearman Correlation: 74.71 |
| text-classification-on-mteb | GTR-Large | Accuracy: 67.14 |
| text-classification-on-mteb | ST5-XL | Accuracy: 72.84 |
| text-classification-on-mteb | ST5-XXL | Accuracy: 73.42 |
| text-classification-on-mteb | LaBSE | Accuracy: 62.71 |
| text-classification-on-mteb | SGPT-125M-nli | Accuracy: 61.46 |
| text-classification-on-mteb | SGPT-5.8B-nli | Accuracy: 70.14 |
| text-classification-on-mteb | Ada Similarity | Accuracy: 70.44 |
| text-classification-on-mteb | MiniLM-L6 | Accuracy: 63.06 |
| text-classification-on-mteb | coCondenser-msmarco | Accuracy: 64.71 |
| text-classification-on-mteb | ST5-Base | Accuracy: 69.81 |
| text-classification-on-mteb | SGPT-BLOOM-7.1B-msmarco | Accuracy: 66.19 |
| text-classification-on-mteb | MPNet-multilingual | Accuracy: 67.91 |
| text-classification-on-mteb | SPECTER | Accuracy: 52.37 |
| text-classification-on-mteb | GTR-XXL | Accuracy: 67.41 |
| text-classification-on-mteb | MPNet | Accuracy: 65.07 |
| text-classification-on-mteb | Komninos | Accuracy: 57.65 |
| text-classification-on-mteb | SimCSE-BERT-unsup | Accuracy: 62.5 |
| text-classification-on-mteb | BERT | Accuracy: 61.66 |
| text-classification-on-mteb | MiniLM-L12-multilingual | Accuracy: 64.3 |
| text-classification-on-mteb | LASER2 | Accuracy: 53.65 |
| text-classification-on-mteb | GTR-XL | Accuracy: 67.11 |
| text-classification-on-mteb | SGPT-125M-msmarco | Accuracy: 60.72 |
| text-classification-on-mteb | Contriever | Accuracy: 66.68 |
| text-classification-on-mteb | SimCSE-BERT-sup | Accuracy: 67.32 |
| text-classification-on-mteb | ST5-Large | Accuracy: 72.31 |
| text-classification-on-mteb | MiniLM-L12 | Accuracy: 63.21 |
| text-classification-on-mteb | Glove | Accuracy: 57.29 |
| text-classification-on-mteb | SGPT-1.3B-msmarco | Accuracy: 66.52 |
| text-classification-on-mteb | GTR-Base | Accuracy: 65.25 |
| text-classification-on-mteb | SGPT-2.7B-msmarco | Accuracy: 67.13 |
| text-classification-on-mteb | SGPT-5.8B-msmarco | Accuracy: 68.13 |
| text-clustering-on-mteb | SPECTER | V-Measure: 34.06 |
| text-clustering-on-mteb | coCondenser-msmarco | V-Measure: 37.64 |
| text-clustering-on-mteb | ST5-XL | V-Measure: 42.34 |
| text-clustering-on-mteb | SGPT-1.3B-msmarco | V-Measure: 39.92 |
| text-clustering-on-mteb | GTR-XL | V-Measure: 41.51 |
| text-clustering-on-mteb | ST5-Base | V-Measure: 40.21 |
| text-clustering-on-mteb | SGPT-125M-msmarco | V-Measure: 35.79 |
| text-clustering-on-mteb | MPNet-multilingual | V-Measure: 38.4 |
| text-clustering-on-mteb | SGPT-125M-nli | V-Measure: 30.95 |
| text-clustering-on-mteb | Komninos | V-Measure: 26.57 |
| text-clustering-on-mteb | SGPT-2.7B-msmarco | V-Measure: 39.83 |
| text-clustering-on-mteb | SGPT-BLOOM-7.1B-msmarco | V-Measure: 38.93 |
| text-clustering-on-mteb | LASER2 | V-Measure: 15.28 |
| text-clustering-on-mteb | ST5-Large | V-Measure: 41.65 |
| text-clustering-on-mteb | SimCSE-BERT-unsup | V-Measure: 29.04 |
| text-clustering-on-mteb | SGPT-5.8B-nli | V-Measure: 36.98 |
| text-clustering-on-mteb | Glove | V-Measure: 27.73 |
| text-clustering-on-mteb | MiniLM-L12 | V-Measure: 41.81 |
| text-clustering-on-mteb | LaBSE | V-Measure: 29.55 |
| text-clustering-on-mteb | MiniLM-L6 | V-Measure: 42.35 |
| text-clustering-on-mteb | BERT | V-Measure: 30.12 |
| text-clustering-on-mteb | SGPT-5.8B-msmarco | V-Measure: 40.35 |
| text-clustering-on-mteb | MiniLM-L12-multilingual | V-Measure: 37.14 |
| text-clustering-on-mteb | MPNet | V-Measure: 43.69 |
| text-clustering-on-mteb | ST5-XXL | V-Measure: 43.71 |
| text-clustering-on-mteb | SimCSE-BERT-sup | V-Measure: 33.43 |
| text-clustering-on-mteb | Ada Similarity | V-Measure: 37.52 |
| text-clustering-on-mteb | GTR-Base | V-Measure: 38.63 |
| text-clustering-on-mteb | Contriever | V-Measure: 41.1 |
| text-clustering-on-mteb | GTR-XXL | V-Measure: 42.42 |
| text-clustering-on-mteb | GTR-Large | V-Measure: 41.6 |
| text-retrieval-on-mteb | BERT | nDCG@10: 10.59 |
| text-retrieval-on-mteb | ST5-XL | nDCG@10: 38.47 |
| text-retrieval-on-mteb | MPNet-multilingual | nDCG@10: 35.34 |
| text-retrieval-on-mteb | SPECTER | nDCG@10: 15.88 |
| text-retrieval-on-mteb | MiniLM-L12 | nDCG@10: 42.69 |
| text-retrieval-on-mteb | GTR-Large | nDCG@10: 47.42 |
| text-retrieval-on-mteb | coCondenser-msmarco | nDCG@10: 32.96 |
| text-retrieval-on-mteb | ST5-Large | nDCG@10: 36.71 |
| text-retrieval-on-mteb | Glove | nDCG@10: 21.62 |
| text-retrieval-on-mteb | LaBSE | nDCG@10: 18.99 |
| text-retrieval-on-mteb | MiniLM-L6 | nDCG@10: 41.95 |
| text-retrieval-on-mteb | SGPT-5.8B-nli | nDCG@10: 32.34 |
| text-retrieval-on-mteb | MPNet | nDCG@10: 43.81 |
| text-retrieval-on-mteb | GTR-Base | nDCG@10: 44.67 |
| text-retrieval-on-mteb | GTR-XXL | nDCG@10: 48.48 |
| text-retrieval-on-mteb | LASER2 | nDCG@10: 7.93 |
| text-retrieval-on-mteb | ST5-XXL | nDCG@10: 42.24 |
| text-retrieval-on-mteb | SGPT-BLOOM-7.1B-msmarco | nDCG@10: 48.21 |
| text-retrieval-on-mteb | SGPT-1.3B-msmarco | nDCG@10: 44.49 |
| text-retrieval-on-mteb | GTR-XL | nDCG@10: 47.96 |
| text-retrieval-on-mteb | SimCSE-BERT-sup | nDCG@10: 21.82 |
| text-retrieval-on-mteb | SimCSE-BERT-unsup | nDCG@10: 20.29 |
| text-retrieval-on-mteb | Komninos | nDCG@10: 21.22 |
| text-retrieval-on-mteb | SGPT-2.7B-msmarco | nDCG@10: 46.54 |
| text-retrieval-on-mteb | SGPT-125M-msmarco | nDCG@10: 37.04 |
| text-retrieval-on-mteb | SGPT-125M-nli | nDCG@10: 20.9 |
| text-retrieval-on-mteb | SGPT-5.8B-msmarco | nDCG@10: 50.25 |
| text-retrieval-on-mteb | MiniLM-L12-multilingual | nDCG@10: 32.45 |
| text-retrieval-on-mteb | ST5-Base | nDCG@10: 33.63 |
| text-retrieval-on-mteb | Contriever | nDCG@10: 41.88 |
| text-summarization-on-mteb | LASER2 | Spearman Correlation: 26.8 |
| text-summarization-on-mteb | Contriever | Spearman Correlation: 30.36 |
| text-summarization-on-mteb | GTR-XL | Spearman Correlation: 30.21 |
| text-summarization-on-mteb | ST5-Large | Spearman Correlation: 29.64 |
| text-summarization-on-mteb | ST5-Base | Spearman Correlation: 31.39 |
| text-summarization-on-mteb | Glove | Spearman Correlation: 28.87 |
| text-summarization-on-mteb | MPNet-multilingual | Spearman Correlation: 31.57 |
| text-summarization-on-mteb | Komninos | Spearman Correlation: 30.49 |
| text-summarization-on-mteb | SGPT-BLOOM-7.1B-msmarco | Spearman Correlation: 24.99 |
| text-summarization-on-mteb | GTR-Base | Spearman Correlation: 29.67 |
| text-summarization-on-mteb | MiniLM-L6 | Spearman Correlation: 30.81 |
| text-summarization-on-mteb | SimCSE-BERT-unsup | Spearman Correlation: 31.15 |
| text-summarization-on-mteb | SimCSE-BERT-sup | Spearman Correlation: 23.31 |
| text-summarization-on-mteb | GTR-XXL | Spearman Correlation: 30.64 |
| text-summarization-on-mteb | MiniLM-L12 | Spearman Correlation: 27.9 |
| text-summarization-on-mteb | coCondenser-msmarco | Spearman Correlation: 29.5 |
| text-summarization-on-mteb | SGPT-5.8B-msmarco | Spearman Correlation: 24.75 |
| text-summarization-on-mteb | SGPT-125M-nli | Spearman Correlation: 30.26 |
| text-summarization-on-mteb | MPNet | Spearman Correlation: 27.49 |
| text-summarization-on-mteb | ST5-XL | Spearman Correlation: 29.91 |
| text-summarization-on-mteb | Ada Similarity | Spearman Correlation: 26.94 |
| text-summarization-on-mteb | SGPT-1.3B-msmarco | Spearman Correlation: 25.44 |
| text-summarization-on-mteb | MiniLM-L12-multilingual | Spearman Correlation: 30.67 |
| text-summarization-on-mteb | BERT | Spearman Correlation: 29.82 |
| text-summarization-on-mteb | ST5-XXL | Spearman Correlation: 30.08 |
| text-summarization-on-mteb | SPECTER | Spearman Correlation: 27.66 |