HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Xiang Kong; Qizhe Xie; Zihang Dai; Eduard Hovy

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Abstract

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

Code Repositories

shawnkx/Fast-MoS
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-wmt2014-english-frenchTransformer Big + MoS
BLEU score: 42.1
Hardware Burden:
Operations per network pass:
machine-translation-on-wmt2014-english-germanTransformer Big + MoS
BLEU score: 29.6
Hardware Burden:
Operations per network pass:

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation | Papers | HyperAI