| LAS multitask with indicators sampling | 20.4 | Attention model for articulatory features detection | |
| Soft Monotonic Attention (ours, offline) | 20.1 | Online and Linear-Time Attention by Enforcing Monotonic Alignments | |
| Bi-LSTM + skip connections w/ CTC | 17.7 | Speech Recognition with Deep Recurrent Neural Networks | |
| Light Gated Recurrent Units | 16.7 | Light Gated Recurrent Units for Speech Recognition | |
| CNN in time and frequency + dropout, 17.6% w/o dropout | 16.7 | - | - |
| Hierarchical maxout CNN + Dropout | 16.5 | - | - |
| RNN + Dropout + BatchNorm + Monophone Reg | 15.9 | The PyTorch-Kaldi Speech Recognition Toolkit | |
| GRU + Dropout + BatchNorm + Monophone Reg | 14.9 | The PyTorch-Kaldi Speech Recognition Toolkit | |
| LSTM + Dropout + BatchNorm + Monophone Reg | 14.5 | The PyTorch-Kaldi Speech Recognition Toolkit | |
| LiGRU + Dropout + BatchNorm + Monophone Reg | 14.2 | The PyTorch-Kaldi Speech Recognition Toolkit | |