HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

R-Drop: Regularized Dropout for Neural Networks

Xiaobo Liang Lijun Wu Juntao Li Yue Wang Qi Meng Tao Qin Wei Chen Min Zhang Tie-Yan Liu

R-Drop: Regularized Dropout for Neural Networks

Abstract

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{https://github.com/dropreg/R-Drop}}.

Code Repositories

fushengwuyu/R-Drop
pytorch
Mentioned in GitHub
btobab/R-Drop
paddle
Mentioned in GitHub
dropreg/R-Drop
Official
jax
Mentioned in GitHub
bojone/r-drop
tf
Mentioned in GitHub
zpc-666/Paddle-R-Drop
paddle
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
abstractive-text-summarization-on-cnn-dailyBART + R-Drop
ROUGE-1: 44.51
ROUGE-2: 21.58
ROUGE-L: 41.24
machine-translation-on-iwslt2014-germanTransformer + R-Drop
BLEU score: 37.25
machine-translation-on-iwslt2014-germanTransformer + R-Drop + Cutoff
BLEU score: 37.90
machine-translation-on-wmt2014-english-frenchTransformer + R-Drop
BLEU score: 43.95
Hardware Burden:
Operations per network pass:
machine-translation-on-wmt2014-english-germanTransformer + R-Drop
BLEU score: 30.91
Hardware Burden: 49G
Operations per network pass:

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
R-Drop: Regularized Dropout for Neural Networks | Papers | HyperAI