3 个月前

SMART:通过合理正则化优化实现预训练自然语言模型的鲁棒且高效的微调

SMART:通过合理正则化优化实现预训练自然语言模型的鲁棒且高效的微调

摘要

迁移学习从根本上改变了自然语言处理(NLP)研究的格局。目前许多先进的模型首先在大规模文本语料上进行预训练,随后在下游任务上进行微调。然而,由于下游任务的数据资源有限,且预训练模型具有极高的容量,激进的微调策略往往导致模型在下游任务上过拟合,同时遗忘预训练阶段所学习到的知识。为更系统、更严谨地解决上述问题,我们提出了一种新的计算框架,用于实现预训练语言模型的鲁棒且高效的微调。具体而言,该框架包含两个关键组成部分:1)平滑性诱导正则化(smoothness-inducing regularization),有效控制模型容量;2)Bregman近端点优化(Bregman proximal point optimization),属于一类信赖域方法,能够有效防止知识遗忘。实验结果表明,所提出的方法在多个NLP基准测试中均取得了当前最优的性能表现。

代码仓库

namisan/mt-dnn
官方
pytorch
GitHub 中提及
cliang1453/camero
pytorch
GitHub 中提及
microsoft/MT-DNN
pytorch
GitHub 中提及
chunhuililili/mt_dnn
pytorch
GitHub 中提及

基准测试

基准方法指标
natural-language-inference-on-axT5
Accuracy: 53.1
natural-language-inference-on-mnli-snli-anliSMARTRoBERTa-LARGE
% Dev Accuracy: 57.1
% Test Accuracy: 57.1
natural-language-inference-on-multinliSMART+BERT-BASE
Accuracy: 85.6
natural-language-inference-on-multinliT5
Matched: 92.0
Mismatched: 91.7
natural-language-inference-on-multinliSMARTRoBERTa
Dev Matched: 91.1
Dev Mismatched: 91.3
natural-language-inference-on-multinliMT-DNN-SMARTv0
Accuracy: 85.7
natural-language-inference-on-multinliSMART-BERT
Dev Matched: 85.6
Dev Mismatched: 86.0
natural-language-inference-on-multinliMT-DNN-SMART
Accuracy: 85.7
natural-language-inference-on-qnliSMART-BERT-
natural-language-inference-on-qnliALICE
Accuracy: 99.2%
natural-language-inference-on-qnliMT-DNN-SMART
Accuracy: 99.2%
natural-language-inference-on-qnliSMARTRoBERTa-
natural-language-inference-on-rteT5-XXL 11B
Accuracy: 92.5%
natural-language-inference-on-rteSMART-BERT
Accuracy: 71.2%
natural-language-inference-on-rteSMARTRoBERTa
Accuracy: 92.0%
natural-language-inference-on-rteSMART
Accuracy: 71.2%
natural-language-inference-on-scitailMT-DNN-SMART_1%ofTrainingData
Dev Accuracy: 88.6
natural-language-inference-on-scitailMT-DNN-SMARTLARGEv0
% Dev Accuracy: 96.6
% Test Accuracy: 95.2
natural-language-inference-on-scitailMT-DNN-SMART_0.1%ofTrainingData
Dev Accuracy: 82.3
natural-language-inference-on-scitailMT-DNN-SMART_100%ofTrainingData
Dev Accuracy: 96.1
natural-language-inference-on-scitailMT-DNN-SMART_10%ofTrainingData
Dev Accuracy: 91.3
natural-language-inference-on-snliMT-DNN-SMART_0.1%ofTrainingData
Dev Accuracy: 82.7
natural-language-inference-on-snliMT-DNN-SMARTLARGEv0
% Dev Accuracy: 92.6
% Test Accuracy: 91.7
natural-language-inference-on-snliMT-DNN-SMART_1%ofTrainingData
Dev Accuracy: 86
natural-language-inference-on-snliMT-DNN-SMART_100%ofTrainingData
Dev Accuracy: 91.6
natural-language-inference-on-snliMT-DNN-SMART_10%ofTrainingData
Dev Accuracy: 88.7
natural-language-understanding-on-glueMT-DNN-SMART
Average: 89.9
paraphrase-identification-on-quora-questionSMART-BERT
Dev Accuracy: 91.5
Dev F1: 88.5
paraphrase-identification-on-quora-questionFreeLB
Accuracy: 74.8
Dev Accuracy: 92.6
paraphrase-identification-on-quora-questionALICE
F1: 90.7
semantic-textual-similarity-on-mrpcSMART-BERT-
semantic-textual-similarity-on-mrpcSMART
Accuracy: 91.3%
semantic-textual-similarity-on-mrpcSMARTRoBERTa-
semantic-textual-similarity-on-mrpcMT-DNN-SMART
Accuracy: 93.7%
F1: 91.7
semantic-textual-similarity-on-sts-benchmarkSMART-BERT
Dev Pearson Correlation: 90.0
Dev Spearman Correlation: 89.4
semantic-textual-similarity-on-sts-benchmarkMT-DNN-SMART
Pearson Correlation: 0.929
Spearman Correlation: 0.925
semantic-textual-similarity-on-sts-benchmarkSMARTRoBERTa
Dev Pearson Correlation: 92.8
Dev Spearman Correlation: 92.6
sentiment-analysis-on-sst-2-binarySMART+BERT-BASE
Accuracy: 93
sentiment-analysis-on-sst-2-binaryMT-DNN
Accuracy: 93.6
sentiment-analysis-on-sst-2-binarySMART-MT-DNN
Dev Accuracy: 96.1
sentiment-analysis-on-sst-2-binarySMART-BERT
Dev Accuracy: 93.0
sentiment-analysis-on-sst-2-binaryMT-DNN-SMART
Accuracy: 97.5
sentiment-analysis-on-sst-2-binarySMARTRoBERTa
Dev Accuracy: 96.9

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
SMART:通过合理正则化优化实现预训练自然语言模型的鲁棒且高效的微调 | 论文 | HyperAI超神经