Command Palette
Search for a command to run...
Xiaobo Liang Lijun Wu Juntao Li Yue Wang Qi Meng Tao Qin Wei Chen Min Zhang Tie-Yan Liu

Abstract
Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{https://github.com/dropreg/R-Drop}}.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| abstractive-text-summarization-on-cnn-daily | BART + R-Drop | ROUGE-1: 44.51 ROUGE-2: 21.58 ROUGE-L: 41.24 |
| machine-translation-on-iwslt2014-german | Transformer + R-Drop | BLEU score: 37.25 |
| machine-translation-on-iwslt2014-german | Transformer + R-Drop + Cutoff | BLEU score: 37.90 |
| machine-translation-on-wmt2014-english-french | Transformer + R-Drop | BLEU score: 43.95 Hardware Burden: Operations per network pass: |
| machine-translation-on-wmt2014-english-german | Transformer + R-Drop | BLEU score: 30.91 Hardware Burden: 49G Operations per network pass: |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.