Command Palette
Search for a command to run...
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency
Keyu An Hongyu Xiang Zhijian Ou

Abstract
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-recognition-on-aishell-1 | CTC-CRF 4gram-LM | Word Error Rate (WER): 6.34 |
| speech-recognition-on-hub5-00-fisher-swbd | CTC-CRF | Word Error Rate (WER): 12 |
| speech-recognition-on-hub500-switchboard | CTC-CRF | CallHome: 18.4 Hub5'00: 14.1 SwitchBoard: 9.7 |
| speech-recognition-on-wsj-dev93 | CTC-CRF VGG-BLSTM | Word Error Rate (WER): 5.7 |
| speech-recognition-on-wsj-eval92 | CTC-CRF VGG-BLSTM | Word Error Rate (WER): 3.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.