Command Palette
Search for a command to run...
Liang Wang; Nan Yang; Xiaolong Huang; Binxing Jiao; Linjun Yang; Daxin Jiang; Rangan Majumder; Furu Wei

Abstract
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clustering, and classification, achieving strong performance in both zero-shot and fine-tuned settings. We conduct extensive evaluations on 56 datasets from the BEIR and MTEB benchmarks. For zero-shot settings, E5 is the first model that outperforms the strong BM25 baseline on the BEIR retrieval benchmark without using any labeled data. When fine-tuned, E5 obtains the best results on the MTEB benchmark, beating existing embedding models with 40x more parameters.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| task-1-grouping-on-ocw | E5 (LARGE) | Wasserstein Distance (WD): 84.4 ± .7 # Correct Groups: 76 ± 5 # Solved Walls: 0 ± 0 Adjusted Mutual Information (AMI): 18.5 ± .6 Adjusted Rand Index (ARI): 15.4 ± .5 Fowlkes Mallows Score (FMS): 32.3 ± .4 |
| task-1-grouping-on-ocw | E5 (BASE) | Wasserstein Distance (WD): 83.8 ± .6 # Correct Groups: 89 ± 6 # Solved Walls: 1 ± 0 Adjusted Mutual Information (AMI): 19.5 ± .4 Adjusted Rand Index (ARI): 16.3 ± .4 Fowlkes Mallows Score (FMS): 33.1 ± .3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.