HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

An Analysis of Neural Language Modeling at Multiple Scales

Stephen Merity; Nitish Shirish Keskar; Richard Socher

An Analysis of Neural Language Modeling at Multiple Scales

Abstract

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Code Repositories

llppff/ptb-lstmorqrnn-pytorch
pytorch
Mentioned in GitHub
mnhng/hier-char-emb
pytorch
Mentioned in GitHub
Han-JD/GRU-D
pytorch
Mentioned in GitHub
AtheMathmo/lookahead-lstm
pytorch
Mentioned in GitHub
jb33k/awd-lstm-lm-ThinkNet
pytorch
Mentioned in GitHub
SachinIchake/KALM
pytorch
Mentioned in GitHub
philippwirth/treelangrnn
pytorch
Mentioned in GitHub
ari-holtzman/genlm
pytorch
Mentioned in GitHub
arvieFrydenlund/awd-lstm-lm
pytorch
Mentioned in GitHub
philippwirth/awd-lstm-test
pytorch
Mentioned in GitHub
soyoung97/awd-lstm-gru
pytorch
Mentioned in GitHub
salesforce/awd-lstm-lm
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-enwiki8AWD-LSTM (3 layers)
Bit per Character (BPC): 1.232
Number of params: 47M
language-modelling-on-hutter-prize3-layer AWD-LSTM
Bit per Character (BPC): 1.232
Number of params: 47M
language-modelling-on-penn-treebank-character6-layer QRNN
Bit per Character (BPC): 1.187
Number of params: 13.8M
language-modelling-on-penn-treebank-character3-layer AWD-LSTM
Bit per Character (BPC): 1.175
Number of params: 13.8M
language-modelling-on-wikitext-1034 layer QRNN
Number of params: 151M
Test perplexity: 33.0
Validation perplexity: 32.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
An Analysis of Neural Language Modeling at Multiple Scales | Papers | HyperAI