4 months ago

Regularizing and Optimizing LSTM Language Models

Stephen Merity; Nitish Shirish Keskar; Richard Socher

Abstract

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

Code Repositories

google-research/google-research/tree/master/enas_lm

Mentioned in GitHub

Amir-Hofo/Language_Modeling_with_LSTM_models

chris-tng/semi-supervised-nlp

pytorch

Mentioned in GitHub

mamamot/Russian-ULMFit

pytorch

Mentioned in GitHub

jkkummerfeld/emnlp20lm

pytorch

Mentioned in GitHub

Asteur/RERITES-AvgWeightDescentLSTM-PoetryGeneration

pytorch

Mentioned in GitHub

alexandra-chron/wassa-2018

pytorch

Mentioned in GitHub

llppff/ptb-lstmorqrnn-pytorch

pytorch

Mentioned in GitHub

mnhng/hier-char-emb

pytorch

Mentioned in GitHub

BenjiKCF/AWD-LSTM-sentiment-classifier

pytorch

Mentioned in GitHub

cstorm125/thai2fit

pytorch

Mentioned in GitHub

S-Abdelnabi/awt

pytorch

Mentioned in GitHub

prajjwal1/language-modelling

pytorch

Mentioned in GitHub

Han-JD/GRU-D

pytorch

Mentioned in GitHub

uclanlp/NamedEntityLanguageModel

pytorch

Mentioned in GitHub

NightmareVoid/LSTM_for_EEG

pytorch

Mentioned in GitHub

AtheMathmo/lookahead-lstm

pytorch

Mentioned in GitHub

uchange/ulangel

pytorch

Mentioned in GitHub

JessikaSmith/language_model

Mentioned in GitHub

jb33k/awd-lstm-lm-ThinkNet

pytorch

Mentioned in GitHub

dmlc/gluon-nlp

mxnet

castorini/hedwig

pytorch

Mentioned in GitHub

ahmetumutdurmus/awd-lstm

pytorch

Mentioned in GitHub

fastai/fastai/blob/master/fastai/text/models/awdlstm.py

pytorch

muellerzr/CodeFest_2019

Mentioned in GitHub

SachinIchake/KALM

pytorch

Mentioned in GitHub

alexandra-chron/ntua-slp-wassa-iest2018

pytorch

Mentioned in GitHub

vganesh46/awd-lstm-pytorch-implementation

pytorch

Mentioned in GitHub

nkcr/overlap-ml

pytorch

Mentioned in GitHub

varshinireddyt/ULMFiT

Mentioned in GitHub

jhave/RERITES-AvgWeightDescentLSTM-PoetryGeneration

pytorch

Mentioned in GitHub

philippwirth/treelangrnn

pytorch

Mentioned in GitHub

Janus-Shiau/awd-lstm-tensorflow

Mentioned in GitHub

rajs96/ULMFiT-Twitter-US-Airline-Sentiment

Mentioned in GitHub

iwangjian/ByteCup2018

pytorch

Mentioned in GitHub

noise-field/Russian-ULMFit

pytorch

Mentioned in GitHub

arvieFrydenlund/awd-lstm-lm

pytorch

Mentioned in GitHub

philippwirth/awd-lstm-test

pytorch

Mentioned in GitHub

ari-holtzman/genlm

pytorch

Mentioned in GitHub

AtheMathmo/AggMo

pytorch

Mentioned in GitHub

soyoung97/awd-lstm-gru

pytorch

Mentioned in GitHub

Mees-Molenaar/protein_location

pytorch

Mentioned in GitHub

Machine-Learning-Tokyo/Poetry-GAN

Mentioned in GitHub

salesforce/awd-lstm-lm

Official

pytorch

Mentioned in GitHub

castorini/Castor

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
language-modelling-on-penn-treebank-word	AWD-LSTM + continuous cache pointer	Params: 24M Test perplexity: 52.8 Validation perplexity: 53.9
language-modelling-on-penn-treebank-word	AWD-LSTM	Params: 24M Test perplexity: 57.3 Validation perplexity: 60.0
language-modelling-on-wikitext-2	AWD-LSTM + continuous cache pointer	Number of params: 33M Test perplexity: 52.0 Validation perplexity: 53.8
language-modelling-on-wikitext-2	AWD-LSTM	Number of params: 33M Test perplexity: 65.8 Validation perplexity: 68.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Regularizing and Optimizing LSTM Language Models

Stephen Merity; Nitish Shirish Keskar; Richard Socher

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters