HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang Pengcheng He Weizhu Chen Xiaodong Liu Jianfeng Gao Tuo Zhao

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Abstract

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.

Code Repositories

namisan/mt-dnn
Official
pytorch
Mentioned in GitHub
cliang1453/camero
pytorch
Mentioned in GitHub
microsoft/MT-DNN
pytorch
Mentioned in GitHub
chunhuililili/mt_dnn
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
natural-language-inference-on-axT5
Accuracy: 53.1
natural-language-inference-on-mnli-snli-anliSMARTRoBERTa-LARGE
% Dev Accuracy: 57.1
% Test Accuracy: 57.1
natural-language-inference-on-multinliSMART+BERT-BASE
Accuracy: 85.6
natural-language-inference-on-multinliT5
Matched: 92.0
Mismatched: 91.7
natural-language-inference-on-multinliSMARTRoBERTa
Dev Matched: 91.1
Dev Mismatched: 91.3
natural-language-inference-on-multinliMT-DNN-SMARTv0
Accuracy: 85.7
natural-language-inference-on-multinliSMART-BERT
Dev Matched: 85.6
Dev Mismatched: 86.0
natural-language-inference-on-multinliMT-DNN-SMART
Accuracy: 85.7
natural-language-inference-on-qnliSMART-BERT-
natural-language-inference-on-qnliALICE
Accuracy: 99.2%
natural-language-inference-on-qnliMT-DNN-SMART
Accuracy: 99.2%
natural-language-inference-on-qnliSMARTRoBERTa-
natural-language-inference-on-rteT5-XXL 11B
Accuracy: 92.5%
natural-language-inference-on-rteSMART-BERT
Accuracy: 71.2%
natural-language-inference-on-rteSMARTRoBERTa
Accuracy: 92.0%
natural-language-inference-on-rteSMART
Accuracy: 71.2%
natural-language-inference-on-scitailMT-DNN-SMART_1%ofTrainingData
Dev Accuracy: 88.6
natural-language-inference-on-scitailMT-DNN-SMARTLARGEv0
% Dev Accuracy: 96.6
% Test Accuracy: 95.2
natural-language-inference-on-scitailMT-DNN-SMART_0.1%ofTrainingData
Dev Accuracy: 82.3
natural-language-inference-on-scitailMT-DNN-SMART_100%ofTrainingData
Dev Accuracy: 96.1
natural-language-inference-on-scitailMT-DNN-SMART_10%ofTrainingData
Dev Accuracy: 91.3
natural-language-inference-on-snliMT-DNN-SMART_0.1%ofTrainingData
Dev Accuracy: 82.7
natural-language-inference-on-snliMT-DNN-SMARTLARGEv0
% Dev Accuracy: 92.6
% Test Accuracy: 91.7
natural-language-inference-on-snliMT-DNN-SMART_1%ofTrainingData
Dev Accuracy: 86
natural-language-inference-on-snliMT-DNN-SMART_100%ofTrainingData
Dev Accuracy: 91.6
natural-language-inference-on-snliMT-DNN-SMART_10%ofTrainingData
Dev Accuracy: 88.7
natural-language-understanding-on-glueMT-DNN-SMART
Average: 89.9
paraphrase-identification-on-quora-questionSMART-BERT
Dev Accuracy: 91.5
Dev F1: 88.5
paraphrase-identification-on-quora-questionFreeLB
Accuracy: 74.8
Dev Accuracy: 92.6
paraphrase-identification-on-quora-questionALICE
F1: 90.7
semantic-textual-similarity-on-mrpcSMART-BERT-
semantic-textual-similarity-on-mrpcSMART
Accuracy: 91.3%
semantic-textual-similarity-on-mrpcSMARTRoBERTa-
semantic-textual-similarity-on-mrpcMT-DNN-SMART
Accuracy: 93.7%
F1: 91.7
semantic-textual-similarity-on-sts-benchmarkSMART-BERT
Dev Pearson Correlation: 90.0
Dev Spearman Correlation: 89.4
semantic-textual-similarity-on-sts-benchmarkMT-DNN-SMART
Pearson Correlation: 0.929
Spearman Correlation: 0.925
semantic-textual-similarity-on-sts-benchmarkSMARTRoBERTa
Dev Pearson Correlation: 92.8
Dev Spearman Correlation: 92.6
sentiment-analysis-on-sst-2-binarySMART+BERT-BASE
Accuracy: 93
sentiment-analysis-on-sst-2-binaryMT-DNN
Accuracy: 93.6
sentiment-analysis-on-sst-2-binarySMART-MT-DNN
Dev Accuracy: 96.1
sentiment-analysis-on-sst-2-binarySMART-BERT
Dev Accuracy: 93.0
sentiment-analysis-on-sst-2-binaryMT-DNN-SMART
Accuracy: 97.5
sentiment-analysis-on-sst-2-binarySMARTRoBERTa
Dev Accuracy: 96.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | Papers | HyperAI