
摘要
在自然语言处理(NLP)领域,大量任务涉及对两个序列之间的成对比较(例如句子相似度计算与释义识别)。目前,句子对任务主要采用两种建模范式:双编码器(bi-encoder)和交叉编码器(cross-encoder)。双编码器能够生成固定维度的句子表示,计算效率高,但通常性能低于交叉编码器;而交叉编码器利用其注意力机制捕捉句间交互关系,因而能获得更优的性能,但需要任务微调,且计算开销较大。本文提出一种完全无监督的句子表示模型——Trans-Encoder,该模型将双编码器与交叉编码器两种学习范式整合进一个迭代联合框架中,实现对增强型双编码器与交叉编码器的同步学习。具体而言,在预训练语言模型(PLM)的基础上,我们首先将其转化为无监督的双编码器,随后在双编码器与交叉编码器两种任务形式之间交替进行训练。在每次交替过程中,一种任务形式生成伪标签(pseudo-labels),作为另一任务形式的学习信号。此外,我们进一步提出一种扩展方法,可在多个预训练语言模型上并行执行该自蒸馏过程,并通过聚合各模型生成的伪标签实现相互蒸馏(mutual-distillation)。据我们所知,Trans-Encoder是首个完全无监督的交叉编码器模型,同时在句子相似度任务上达到了当前最先进的无监督双编码器水平。在句子相似度基准测试中,Trans-Encoder的双编码器与交叉编码器两种形式均显著优于近期提出的先进无监督句子编码器(如Mirror-BERT和SimCSE),性能提升最高可达5%。
代码仓库
amzn/trans-encoder
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.7133 |
| semantic-textual-similarity-on-sick | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.7163 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.6952 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.7276 |
| semantic-textual-similarity-on-sick | Trans-Encoder-BERT-large-cross (unsup.) | Spearman Correlation: 0.7192 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.867 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.839 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8616 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-large-bi (unsup.) | Spearman Correlation: 0.8655 |
| semantic-textual-similarity-on-sts-benchmark | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8465 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.7819 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.7828 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.7637 |
| semantic-textual-similarity-on-sts12 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.7509 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.8559 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8831 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8851 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-large-cross (unsup.) | Spearman Correlation: 0.8831 |
| semantic-textual-similarity-on-sts13 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.851 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8137 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-large-bi (unsup.) | Spearman Correlation: 0.8176 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8194 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.7903 |
| semantic-textual-similarity-on-sts14 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.779 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8577 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.8508 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8816 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-BERT-base-cross (unsup.) | Spearman Correlation: 0.8444 |
| semantic-textual-similarity-on-sts15 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8863 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-BERT-base-bi (unsup.) | Spearman Correlation: 0.8305 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-RoBERTa-large-cross (unsup.) | Spearman Correlation: 0.8503 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-BERT-large-bi (unsup.) | Spearman Correlation: 0.8481 |
| semantic-textual-similarity-on-sts16 | Trans-Encoder-RoBERTa-base-cross (unsup.) | Spearman Correlation: 0.8377 |