| LAS multitask with indicators sampling | 20.4 | Attention model for articulatory features detection |  | 
| Soft Monotonic Attention (ours, offline) | 20.1 | Online and Linear-Time Attention by Enforcing Monotonic Alignments |  | 
| Bi-LSTM + skip connections w/ CTC | 17.7 | Speech Recognition with Deep Recurrent Neural Networks |  | 
| Light Gated Recurrent Units | 16.7 | Light Gated Recurrent Units for Speech Recognition |  | 
| CNN in time and frequency + dropout, 17.6% w/o dropout | 16.7 | - | - | 
| Hierarchical maxout CNN + Dropout | 16.5 | - | - | 
| RNN + Dropout + BatchNorm + Monophone Reg | 15.9 | The PyTorch-Kaldi Speech Recognition Toolkit |  | 
| GRU + Dropout + BatchNorm + Monophone Reg | 14.9 | The PyTorch-Kaldi Speech Recognition Toolkit |  | 
| LSTM + Dropout + BatchNorm + Monophone Reg | 14.5 | The PyTorch-Kaldi Speech Recognition Toolkit |  | 
| LiGRU + Dropout + BatchNorm + Monophone Reg | 14.2 | The PyTorch-Kaldi Speech Recognition Toolkit |  |