Language Modelling On Enwiki8

评估指标

Bit per Character (BPC)
Number of params

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
LSTM (7 layers)1.67-Generating Sequences With Recurrent Neural Networks
Hypernetworks1.3427MHyperNetworks
SHA-LSTM (4 layers, h=1024, no attention head)1.3351MSingle Headed Attention RNN: Stop Thinking With Your Head
LN HM-LSTM1.3235MHierarchical Multiscale Recurrent Neural Networks
ByteNet1.31-Neural Machine Translation in Linear Time
Recurrent Highway Networks1.2746MRecurrent Highway Networks
Large FS-LSTM-4 1.2547MFast-Slow Recurrent Neural Networks
Large mLSTM1.2446MMultiplicative LSTM for sequence modelling
AWD-LSTM (3 layers)1.23247MAn Analysis of Neural Language Modeling at Multiple Scales
Cluster-Former (#C=512)1.22-Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding-
LSTM1.19548MMogrifier LSTM
Mogrifier LSTM1.14648MMogrifier LSTM
64-layer Character Transformer Model1.1144MCharacter-Level Language Modeling with Deeper Self-Attention
SHA-RNN (4 layers, h=1024, single attention head)1.07652MSingle Headed Attention RNN: Stop Thinking With Your Head
SHA-RNN (4 layers, h=1024, attention head per layer)1.06854MSingle Headed Attention RNN: Stop Thinking With Your Head
Transformer-XL (12 layers)1.0641MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer (64 layers)1.06235MCharacter-Level Language Modeling with Deeper Self-Attention
Skip Cross-Head Transformer-XL1.03341MMemory-efficient Stochastic methods for Memory-based Transformers
Transformer-XL (18 layers)1.0388MTransformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer+SSA1.024-The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
0 of 42 row(s) selected.
Language Modelling On Enwiki8 | SOTA | HyperAI超神经