HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Deep Speech: Scaling up end-to-end speech recognition

Awni Hannun; Carl Case; Jared Casper; Bryan Catanzaro; Greg Diamos; Erich Elsen; Ryan Prenger; Sanjeev Satheesh; Shubho Sengupta; Adam Coates; Andrew Y. Ng

Deep Speech: Scaling up end-to-end speech recognition

Abstract

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Code Repositories

robmsmt/KerasDeepSpeech
tf
Mentioned in GitHub
mozilla/DeepSpeech
tf
Mentioned in GitHub
pannous/caffe-speech-recognition
caffe2
Mentioned in GitHub
GeorgeFedoseev/DeepSpeech
tf
Mentioned in GitHub
RezisEwig/unity_speech
Mentioned in GitHub
YuBeomGon/DeepSpeech
tf
Mentioned in GitHub
Picovoice/stt-benchmark
Mentioned in GitHub
soarsmu/crossasr
paddle
Mentioned in GitHub
Loghijiaha/DeepSpeech-Indo
tf
Mentioned in GitHub
mozilla/STT
tf
Mentioned in GitHub
mangushev/deep_speech
tf
Mentioned in GitHub
msalhab96/SpeeQ
pytorch
Mentioned in GitHub
lissyx/STT
tf
Mentioned in GitHub
myrtleSoftware/deepspeech
pytorch
Mentioned in GitHub
RashadGarayev/TRSpeech-to-text
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
accented-speech-recognition-on-voxforgeDeep Speech
Percentage error: 45.35
accented-speech-recognition-on-voxforge-1Deep Speech
Percentage error: 28.46
accented-speech-recognition-on-voxforge-2Deep Speech
Percentage error: 31.20
accented-speech-recognition-on-voxforge-3Deep Speech
Percentage error: 15.01
noisy-speech-recognition-on-chime-cleanCNN + Bi-RNN + CTC (speech to letters)
Percentage error: 6.3
noisy-speech-recognition-on-chime-realCNN + Bi-RNN + CTC (speech to letters)
Percentage error: 67.94
speech-recognition-on-swb_hub_500-werCNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB
Percentage error: 16
speech-recognition-on-switchboard-hub500Deep Speech + FSH
Percentage error: 12.6
speech-recognition-on-switchboard-hub500CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB
Percentage error: 12.6
speech-recognition-on-switchboard-hub500Deep Speech
Percentage error: 20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep Speech: Scaling up end-to-end speech recognition | Papers | HyperAI