HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

Zoltán Tüske George Saon Kartik Audhkhasi Brian Kingsbury

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

Abstract

It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training. In this paper, we show that state-of-the-art recognition performance can be achieved on the Switchboard-300 database using a single headed attention, LSTM based model. Using a cross-utterance language model, our single-pass speaker independent system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and CallHome subsets of Hub5'00, without a pronunciation lexicon. While careful regularization and data augmentation are crucial in achieving this level of performance, experiments on Switchboard-2000 show that nothing is more useful than more data. Overall, the combination of various regularizations and a simple but fairly large model results in a new state of the art, 4.7% and 7.8% WER on the Switchboard and CallHome sets, using SWB-2000 without any external data resources.

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-swb_hub_500-werIBM (LSTM encoder-decoder)
Percentage error: 7.8
speech-recognition-on-switchboard-hub500IBM (LSTM encoder-decoder)
Percentage error: 4.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard | Papers | HyperAI