Language Modelling On One Billion Word

评估指标

Number of params
PPL

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Sparse Non-Negative33B52.9Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation-
RNN-1024 + 9 Gram20B51.3One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
GPT-21.54B42.16Language Models are Unsupervised Multitask Learners-
BIG G-LSTM-2-36.0Factorization tricks for LSTM networks
Low-Budget MoE5B34.1Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
GCNN-14 bottleneck-31.9Language Modeling with Gated Convolutional Networks
LSTM-8192-10241.8B30.6Exploring the Limits of Language Modeling
LSTM-8192-1024 + CNN Input1.04B30.0Exploring the Limits of Language Modeling
Evolved Transformer Big-28.6The Evolved Transformer
High-Budget MoE5B 28.0Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
DynamicConv0.34B26.67Pay Less Attention with Lightweight and Dynamic Convolutions
SRU++328M25.1When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Cohere Large-25.06--
Mesh Tensorflow4.9B 24.0Mesh-TensorFlow: Deep Learning for Supercomputers
Adaptive Input Large0.46B23.91Adaptive Input Representations for Neural Language Modeling
10 LSTM+CNN inputs + SNM10-SKIP (ensemble)43B23.7Exploring the Limits of Language Modeling
Transformer-XL Base0.46B23.5Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
SRU++ Large465M23.5When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Adaptive Input Very Large1.0B23.02Adaptive Input Representations for Neural Language Modeling
MDLM110M23.00Simple and Effective Masked Diffusion Language Models
0 of 27 row(s) selected.
Language Modelling On One Billion Word | SOTA | HyperAI超神经