HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Regularizing and Optimizing LSTM Language Models

Stephen Merity; Nitish Shirish Keskar; Richard Socher

Regularizing and Optimizing LSTM Language Models

Abstract

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

Code Repositories

chris-tng/semi-supervised-nlp
pytorch
Mentioned in GitHub
mamamot/Russian-ULMFit
pytorch
Mentioned in GitHub
jkkummerfeld/emnlp20lm
pytorch
Mentioned in GitHub
alexandra-chron/wassa-2018
pytorch
Mentioned in GitHub
llppff/ptb-lstmorqrnn-pytorch
pytorch
Mentioned in GitHub
mnhng/hier-char-emb
pytorch
Mentioned in GitHub
BenjiKCF/AWD-LSTM-sentiment-classifier
pytorch
Mentioned in GitHub
cstorm125/thai2fit
pytorch
Mentioned in GitHub
S-Abdelnabi/awt
pytorch
Mentioned in GitHub
prajjwal1/language-modelling
pytorch
Mentioned in GitHub
Han-JD/GRU-D
pytorch
Mentioned in GitHub
uclanlp/NamedEntityLanguageModel
pytorch
Mentioned in GitHub
NightmareVoid/LSTM_for_EEG
pytorch
Mentioned in GitHub
AtheMathmo/lookahead-lstm
pytorch
Mentioned in GitHub
uchange/ulangel
pytorch
Mentioned in GitHub
JessikaSmith/language_model
tf
Mentioned in GitHub
jb33k/awd-lstm-lm-ThinkNet
pytorch
Mentioned in GitHub
castorini/hedwig
pytorch
Mentioned in GitHub
ahmetumutdurmus/awd-lstm
pytorch
Mentioned in GitHub
muellerzr/CodeFest_2019
Mentioned in GitHub
SachinIchake/KALM
pytorch
Mentioned in GitHub
alexandra-chron/ntua-slp-wassa-iest2018
pytorch
Mentioned in GitHub
nkcr/overlap-ml
pytorch
Mentioned in GitHub
varshinireddyt/ULMFiT
Mentioned in GitHub
philippwirth/treelangrnn
pytorch
Mentioned in GitHub
Janus-Shiau/awd-lstm-tensorflow
tf
Mentioned in GitHub
iwangjian/ByteCup2018
pytorch
Mentioned in GitHub
noise-field/Russian-ULMFit
pytorch
Mentioned in GitHub
arvieFrydenlund/awd-lstm-lm
pytorch
Mentioned in GitHub
philippwirth/awd-lstm-test
pytorch
Mentioned in GitHub
ari-holtzman/genlm
pytorch
Mentioned in GitHub
AtheMathmo/AggMo
pytorch
Mentioned in GitHub
soyoung97/awd-lstm-gru
pytorch
Mentioned in GitHub
Mees-Molenaar/protein_location
pytorch
Mentioned in GitHub
salesforce/awd-lstm-lm
Official
pytorch
Mentioned in GitHub
castorini/Castor
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-penn-treebank-wordAWD-LSTM + continuous cache pointer
Params: 24M
Test perplexity: 52.8
Validation perplexity: 53.9
language-modelling-on-penn-treebank-wordAWD-LSTM
Params: 24M
Test perplexity: 57.3
Validation perplexity: 60.0
language-modelling-on-wikitext-2AWD-LSTM + continuous cache pointer
Number of params: 33M
Test perplexity: 52.0
Validation perplexity: 53.8
language-modelling-on-wikitext-2AWD-LSTM
Number of params: 33M
Test perplexity: 65.8
Validation perplexity: 68.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Regularizing and Optimizing LSTM Language Models | Papers | HyperAI