3 个月前

Trans-Encoder:通过自蒸馏与互蒸馏实现的无监督句对建模

Trans-Encoder:通过自蒸馏与互蒸馏实现的无监督句对建模

摘要

在自然语言处理(NLP)领域,大量任务涉及对两个序列之间的成对比较(例如句子相似度计算与释义识别)。目前,句子对任务主要采用两种建模范式:双编码器(bi-encoder)和交叉编码器(cross-encoder)。双编码器能够生成固定维度的句子表示,计算效率高,但通常性能低于交叉编码器;而交叉编码器利用其注意力机制捕捉句间交互关系,因而能获得更优的性能,但需要任务微调,且计算开销较大。本文提出一种完全无监督的句子表示模型——Trans-Encoder,该模型将双编码器与交叉编码器两种学习范式整合进一个迭代联合框架中,实现对增强型双编码器与交叉编码器的同步学习。具体而言,在预训练语言模型(PLM)的基础上,我们首先将其转化为无监督的双编码器,随后在双编码器与交叉编码器两种任务形式之间交替进行训练。在每次交替过程中,一种任务形式生成伪标签(pseudo-labels),作为另一任务形式的学习信号。此外,我们进一步提出一种扩展方法,可在多个预训练语言模型上并行执行该自蒸馏过程,并通过聚合各模型生成的伪标签实现相互蒸馏(mutual-distillation)。据我们所知,Trans-Encoder是首个完全无监督的交叉编码器模型,同时在句子相似度任务上达到了当前最先进的无监督双编码器水平。在句子相似度基准测试中,Trans-Encoder的双编码器与交叉编码器两种形式均显著优于近期提出的先进无监督句子编码器(如Mirror-BERT和SimCSE),性能提升最高可达5%。

代码仓库

amzn/trans-encoder
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
semantic-textual-similarity-on-sickTrans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.7133
semantic-textual-similarity-on-sickTrans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.7163
semantic-textual-similarity-on-sickTrans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.6952
semantic-textual-similarity-on-sickTrans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.7276
semantic-textual-similarity-on-sickTrans-Encoder-BERT-large-cross (unsup.)
Spearman Correlation: 0.7192
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.867
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.839
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8616
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-large-bi (unsup.)
Spearman Correlation: 0.8655
semantic-textual-similarity-on-sts-benchmarkTrans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8465
semantic-textual-similarity-on-sts12Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.7819
semantic-textual-similarity-on-sts12Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.7828
semantic-textual-similarity-on-sts12Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.7637
semantic-textual-similarity-on-sts12Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.7509
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.8559
semantic-textual-similarity-on-sts13Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8831
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8851
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-large-cross (unsup.)
Spearman Correlation: 0.8831
semantic-textual-similarity-on-sts13Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.851
semantic-textual-similarity-on-sts14Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8137
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-large-bi (unsup.)
Spearman Correlation: 0.8176
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8194
semantic-textual-similarity-on-sts14Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.7903
semantic-textual-similarity-on-sts14Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.779
semantic-textual-similarity-on-sts15Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8577
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.8508
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8816
semantic-textual-similarity-on-sts15Trans-Encoder-BERT-base-cross (unsup.)
Spearman Correlation: 0.8444
semantic-textual-similarity-on-sts15Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8863
semantic-textual-similarity-on-sts16Trans-Encoder-BERT-base-bi (unsup.)
Spearman Correlation: 0.8305
semantic-textual-similarity-on-sts16Trans-Encoder-RoBERTa-large-cross (unsup.)
Spearman Correlation: 0.8503
semantic-textual-similarity-on-sts16Trans-Encoder-BERT-large-bi (unsup.)
Spearman Correlation: 0.8481
semantic-textual-similarity-on-sts16Trans-Encoder-RoBERTa-base-cross (unsup.)
Spearman Correlation: 0.8377

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Trans-Encoder:通过自蒸馏与互蒸馏实现的无监督句对建模 | 论文 | HyperAI超神经