HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Unsupervised Statistical Machine Translation

Mikel Artetxe; Gorka Labaka; Eneko Agirre

Unsupervised Statistical Machine Translation

Abstract

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses

Code Repositories

artetxem/phrase2vec
Mentioned in GitHub
artetxem/monoses
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-wmt2014-english-frenchSMT + iterative backtranslation (unsupervised)
BLEU score: 26.22
machine-translation-on-wmt2014-english-germanSMT + iterative backtranslation (unsupervised)
BLEU score: 14.08
Hardware Burden:
Operations per network pass:
machine-translation-on-wmt2014-french-englishSMT + iterative backtranslation (unsupervised)
BLEU score: 25.87
machine-translation-on-wmt2014-german-englishSMT + iterative backtranslation (unsupervised)
BLEU score: 17.43
machine-translation-on-wmt2016-english-germanSMT + iterative backtranslation (unsupervised)
BLEU score: 18.23
machine-translation-on-wmt2016-german-englishSMT + iterative backtranslation (unsupervised)
BLEU score: 23.05
unsupervised-machine-translation-on-wmt2014-1SMT
BLEU: 25.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Unsupervised Statistical Machine Translation | Papers | HyperAI