Language Modelling On Text8

评估指标

Bit per Character (BPC)

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
td-LSTM (Zhang et al., 2016)1.63Architectural Complexity Measures of Recurrent Neural Networks-
td-LSTM-large1.49Architectural Complexity Measures of Recurrent Neural Networks-
BFN1.41Bayesian Flow Networks
Unregularised mLSTM1.40Multiplicative LSTM for sequence modelling
BN LSTM1.36Recurrent Batch Normalization
LayerNorm HM-LSTM1.29Hierarchical Multiscale Recurrent Neural Networks
Large mLSTM +emb +WN +VD1.27Multiplicative LSTM for sequence modelling
Large RHN1.27Recurrent Highway Networks
Bipartite flows (8 flows)1.23Discrete Flows: Invertible Generative Models of Discrete Data
mLSTM + dynamic eval1.19Dynamic Evaluation of Neural Sequence Models
12-layer Character Transformer Model1.18Character-Level Language Modeling with Deeper Self-Attention
PAR Transformer 24B1.18Pay Attention when Required
GAM-RHN-101.157Recurrent Highway Networks with Grouped Auxiliary Memory-
64-layer Character Transformer Model1.13Character-Level Language Modeling with Deeper Self-Attention
12L Transformer + 8K adaptive span1.11Adaptive Attention Span in Transformers
BP-Transformer - 12 Layers1.11BP-Transformer: Modelling Long-Range Context via Binary Partitioning
All-attention network - 18 layers1.11Augmenting Self-attention with Persistent Memory
Transformer-LS (small)1.09Long-Short Transformer: Efficient Transformers for Language and Vision
All-attention network - 36 layers1.08Augmenting Self-attention with Persistent Memory
Transformer-XL - 24 layers1.08Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
0 of 24 row(s) selected.
Language Modelling On Text8 | SOTA | HyperAI超神经