Language Modelling On Wikitext 103

评估指标

Number of params
Test perplexity

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
LSTM-48.7Improving Neural Language Models with a Continuous Cache
Temporal CNN-45.2Convolutional Sequence Modeling Revisited-
TCN-45.19An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
GCNN-8-44.9Language Modeling with Gated Convolutional Networks
Neural cache model (size = 100)-44.8Improving Neural Language Models with a Continuous Cache
Neural cache model (size = 2,000)-40.8Improving Neural Language Models with a Continuous Cache
GPT-2 Small124M37.50Language Models are Unsupervised Multitask Learners-
GCNN-8-37.2Language Modeling with Gated Convolutional Networks
LSTM-36.4Fast Parametric Learning with Activation Memorization-
LSTM (Hebbian)-34.3Fast Parametric Learning with Activation Memorization-
4 layer QRNN151M33.0An Analysis of Neural Language Modeling at Multiple Scales
AWD-LSTM-MoS + ATOI-32.85Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
DEQ-Transformer (small)138M32.4Deep Equilibrium Models
LSTM (RMC)-31.6Relational recurrent neural networks
Primal.+Trans.-31.0Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Rfa-Gate-Gaussian-Stateful (Small)-30.5Random Feature Attention-
LSTM (Hebbian, Cache)-29.7Fast Parametric Learning with Activation Memorization-
LSTM (Hebbian, Cache, MbPA)-29.2Fast Parametric Learning with Activation Memorization-
Trellis Network-29.19Trellis Networks for Sequence Modeling
DEQ-TrellisNet180M29.0Deep Equilibrium Models
0 of 89 row(s) selected.
Language Modelling On Wikitext 103 | SOTA | HyperAI超神经