Command Palette
Search for a command to run...
Xiaodong Liu Kevin Duh Liyuan Liu Jianfeng Gao

Abstract
We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| machine-translation-on-wmt2014-english-french | Transformer+BT (ADMIN init) | BLEU score: 46.4 SacreBLEU: 44.4 |
| machine-translation-on-wmt2014-english-french | Transformer (ADMIN init) | BLEU score: 43.8 SacreBLEU: 41.8 |
| machine-translation-on-wmt2014-english-german | Transformer (ADMIN init) | BLEU score: 30.1 Number of Params: 256M SacreBLEU: 29.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.