Command Palette
Search for a command to run...
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
Fangyu Liu Yunlong Jiao Jordan Massiah Emine Yilmaz Serhii Havrylov

Abstract
In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained Language Model (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5% on the sentence similarity benchmarks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.7133 |
| semantic-textual-similarity-on-sick | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.7163 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.6952 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.7276 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-large-cross (unsup.) | Spearman Correlation: 0.7192 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.867 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.839 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8616 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-large-bi (unsup.) | Spearman Correlation: 0.8655 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8465 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.7819 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.7828 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.7637 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.7509 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.8559 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8831 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8851 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-large-cross (unsup.) | Spearman Correlation: 0.8831 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.851 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8137 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-large-bi (unsup.) | Spearman Correlation: 0.8176 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8194 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.7903 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.779 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8577 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.8508 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8816 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.8444 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8863 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.8305 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8503 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8481 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8377 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.