HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Dario Amodei; Rishita Anubhai; Eric Battenberg; Carl Case; Jared Casper; Bryan Catanzaro; Jingdong Chen; Mike Chrzanowski; Adam Coates; Greg Diamos; Erich Elsen; Jesse Engel; Linxi Fan; Christopher Fougner; Tony Han; Awni Hannun; Billy Jun; Patrick LeGresley; Libby Lin; Sharan Narang; Andrew Ng; Sherjil Ozair; Ryan Prenger; Jonathan Raiman; Sanjeev Satheesh; David Seetapun; Shubho Sengupta; Yi Wang; Zhiqian Wang; Chong Wang; Bo Xiao; Dani Yogatama; Jun Zhan; Zhenyao Zhu

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Abstract

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Code Repositories

mangelroman/audio2score
pytorch
Mentioned in GitHub
GavinGuan95/Punctuator.Pytorch
pytorch
Mentioned in GitHub
TensorSpeech/TensorFlowASR
tf
Mentioned in GitHub
robmsmt/KerasDeepSpeech
tf
Mentioned in GitHub
freshtan/deepspeech2
mindspore
Mentioned in GitHub
raraz15/DeepTurkish
pytorch
Mentioned in GitHub
SeanNaren/deepspeech.torch
pytorch
Mentioned in GitHub
2023-MindSpore-1/ms-code-63
mindspore
Mentioned in GitHub
baidu-research/warp-ctc
pytorch
Mentioned in GitHub
hkakitani/deepspeech.pytorch
pytorch
Mentioned in GitHub
PaddlePaddle/models
paddle
Mentioned in GitHub
SeanNaren/deepspeech.pytorch
pytorch
Mentioned in GitHub
MangoMoe/VerbalVim
pytorch
Mentioned in GitHub
2023-MindSpore-1/ms-code-57
mindspore
Mentioned in GitHub
sooftware/OpenSpeech
pytorch
Mentioned in GitHub
DeepMark/deepmark
pytorch
Mentioned in GitHub
cosmoquester/speech-recognition
tf
Mentioned in GitHub
https://gitlab.com/sburud/master
pytorch
Mentioned in GitHub
UnofficialJuliaMirror/DeepMark-deepmark
pytorch
Mentioned in GitHub
fd873630/deep_speech_2_korean
pytorch
Mentioned in GitHub
msalhab96/SpeeQ
pytorch
Mentioned in GitHub
switiz/deepspeech2.pytorch
pytorch
Mentioned in GitHub
myrtleSoftware/deepspeech
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
accented-speech-recognition-on-voxforgeDeep Speech 2
Percentage error: 22.44
accented-speech-recognition-on-voxforge-1Deep Speech 2
Percentage error: 13.56
accented-speech-recognition-on-voxforge-2Deep Speech 2
Percentage error: 17.55
accented-speech-recognition-on-voxforge-3Deep Speech 2
Percentage error: 7.55
noisy-speech-recognition-on-chime-cleanDeep Speech 2
Percentage error: 3.34
noisy-speech-recognition-on-chime-realDeep Speech 2
Percentage error: 21.79
speech-recognition-on-librispeech-test-cleanDeep Speech 2
Word Error Rate (WER): 5.33
speech-recognition-on-librispeech-test-otherDeep Speech 2
Word Error Rate (WER): 13.25
speech-recognition-on-wsj-eval92Deep Speech 2
Word Error Rate (WER): 3.60
speech-recognition-on-wsj-eval93Deep Speech 2
Word Error Rate (WER): 4.98

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin | Papers | HyperAI