HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Fangyu Liu Yunlong Jiao Jordan Massiah Emine Yilmaz Serhii Havrylov

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Abstract

In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained Language Model (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5% on the sentence similarity benchmarks.

Code Repositories

amzn/trans-encoder
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semantic-textual-similarity-on-sickTrans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.7133
semantic-textual-similarity-on-sickTrans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.7163
semantic-textual-similarity-on-sickTrans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.6952
semantic-textual-similarity-on-sickTrans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.7276
semantic-textual-similarity-on-sickTrans-Encoder-BERT-large-cross (unsup.)
Spearman Correlation: 0.7192
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.867
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.839
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8616
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-large-bi (unsup.)
Spearman Correlation: 0.8655
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8465
semantic-textual-similarity-on-sts12Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.7819
semantic-textual-similarity-on-sts12Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.7828
semantic-textual-similarity-on-sts12Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.7637
semantic-textual-similarity-on-sts12Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.7509
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.8559
semantic-textual-similarity-on-sts13Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8831
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8851
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-large-cross (unsup.)
Spearman Correlation: 0.8831
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.851
semantic-textual-similarity-on-sts14Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8137
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-large-bi (unsup.)
Spearman Correlation: 0.8176
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8194
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.7903
semantic-textual-similarity-on-sts14Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.779
semantic-textual-similarity-on-sts15Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8577
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.8508
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8816
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.8444
semantic-textual-similarity-on-sts15Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8863
semantic-textual-similarity-on-sts16Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.8305
semantic-textual-similarity-on-sts16Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8503
semantic-textual-similarity-on-sts16Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8481
semantic-textual-similarity-on-sts16Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8377

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | Papers | HyperAI