Command Palette
Search for a command to run...
José A. R. Fonollosa; Noe Casas; Marta R. Costa-jussà

Abstract
The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| machine-translation-on-iwslt2014-german | Local Joint Self-attention | BLEU score: 35.7 |
| machine-translation-on-wmt2014-english-french | Local Joint Self-attention | BLEU score: 43.3 Hardware Burden: Operations per network pass: |
| machine-translation-on-wmt2014-english-german | Local Joint Self-attention | BLEU score: 29.7 Hardware Burden: Operations per network pass: |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.