3 months ago

R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang Lijun Wu Juntao Li Yue Wang Qi Meng Tao Qin Wei Chen Min Zhang Tie-Yan Liu

Abstract

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{https://github.com/dropreg/R-Drop}}.

Code Repositories

fushengwuyu/R-Drop

pytorch

Mentioned in GitHub

btobab/R-Drop

paddle

Mentioned in GitHub

wzh326/R-Drop

paddle

cosmoquester/2021-dialogue-summary-competition

pytorch

Mentioned in GitHub

dropreg/R-Drop

Official

jax

Mentioned in GitHub

bojone/r-drop

Mentioned in GitHub

zpc-666/Paddle-R-Drop

paddle

Mentioned in GitHub

zbp-xxxp/R-Drop-Paddle

paddle

Benchmarks

Benchmark	Methodology	Metrics
abstractive-text-summarization-on-cnn-daily	BART + R-Drop	ROUGE-1: 44.51 ROUGE-2: 21.58 ROUGE-L: 41.24
machine-translation-on-iwslt2014-german	Transformer + R-Drop	BLEU score: 37.25
machine-translation-on-iwslt2014-german	Transformer + R-Drop + Cutoff	BLEU score: 37.90
machine-translation-on-wmt2014-english-french	Transformer + R-Drop	BLEU score: 43.95 Hardware Burden: Operations per network pass:
machine-translation-on-wmt2014-english-german	Transformer + R-Drop	BLEU score: 30.91 Hardware Burden: 49G Operations per network pass:

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang Lijun Wu Juntao Li Yue Wang Qi Meng Tao Qin Wei Chen Min Zhang Tie-Yan Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters