| Neural cache model (size = 100) | - | 44.8 | Improving Neural Language Models with a Continuous Cache | |
| Neural cache model (size = 2,000) | - | 40.8 | Improving Neural Language Models with a Continuous Cache | |
| Rfa-Gate-Gaussian-Stateful (Small) | - | 30.5 | Random Feature Attention | - |
| LSTM (Hebbian, Cache, MbPA) | - | 29.2 | Fast Parametric Learning with Activation Memorization | - |