HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

On the limit of English conversational speech recognition

Zoltán Tüske; George Saon; Brian Kingsbury

On the limit of English conversational speech recognition

Abstract

In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000. Through use of an improved optimizer, speaker vector embeddings, and alternative speech representations we reduce the recognition errors of our LSTM system on Switchboard-300 by 4% relative. Compensation of the decoder model with the probability ratio approach allows more efficient integration of an external language model, and we report 5.9% and 11.5% WER on the SWB and CHM parts of Hub5'00 with very simple LSTM models. Our study also considers the recently proposed conformer, and more advanced self-attention based language models. Overall, the conformer shows similar performance to the LSTM; nevertheless, their combination and decoding with an improved LM reaches a new record on Switchboard-300, 5.0% and 10.0% WER on SWB and CHM. Our findings are also confirmed on Switchboard-2000, and a new state of the art is reported, practically reaching the limit of the benchmark.

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-swb_hub_500-werIBM (LSTM+Conformer encoder-decoder)
Percentage error: 6.8
speech-recognition-on-switchboard-hub500IBM (LSTM+Conformer encoder-decoder)
Percentage error: 4.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
On the limit of English conversational speech recognition | Papers | HyperAI