HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

The Evolved Transformer

David R. So; Chen Liang; Quoc V. Le

The Evolved Transformer

Abstract

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.

Code Repositories

nazarov-yuriy/zh-ru-shared-task
tf
Mentioned in GitHub
tensorflow/tensor2tensor
Official
tf
Mentioned in GitHub
moon23k/Transformer_Archs
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-one-billion-wordEvolved Transformer Big
PPL: 28.6
machine-translation-on-wmt2014-english-czechEvolved Transformer Base
BLEU score: 27.6
machine-translation-on-wmt2014-english-czechEvolved Transformer Big
BLEU score: 28.2
machine-translation-on-wmt2014-english-frenchEvolved Transformer Big
BLEU score: 41.3
machine-translation-on-wmt2014-english-frenchEvolved Transformer Base
BLEU score: 40.6
machine-translation-on-wmt2014-english-germanEvolved Transformer Big
BLEU score: 29.8
Number of Params: 218M
SacreBLEU: 29.2
machine-translation-on-wmt2014-english-germanEvolved Transformer Base
BLEU score: 28.4
Hardware Burden: 2488G

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
The Evolved Transformer | Papers | HyperAI