Command Palette
Search for a command to run...
Sanyuan Chen Yu Wu Zhuo Chen Jian Wu Jinyu Li Takuya Yoshioka Chengyi Wang Shujie Liu Ming Zhou

Abstract
Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-separation-on-libricss | Conformer (large) | 0L: 5.0 0S: 5.4 10%: 7.5 20%: 10.7 30%: 13.8 40%: 17.1 |
| speech-separation-on-libricss | Conformer (base) | 0L: 5.4 0S: 5.6 10%: 8.2 20%: 11.8 30%: 15.5 40%: 18.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.