Home Console Docs News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

4 months ago

Improved training of end-to-end attention models for speech recognition

View Paper Details

Albert Zeyer; Kazuki Irie; Ralf Schlüter; Hermann Ney

Improved training of end-to-end attention models for speech recognition

Abstract

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model.

Code Repositories

bobchennan/espnet

pytorch

Mentioned in GitHub

pzelasko/espnet

pytorch

Mentioned in GitHub

pytorch

Mentioned in GitHub

pengchengguo/espnet

pytorch

Mentioned in GitHub

marynader6/espnetV.8

pytorch

Mentioned in GitHub

rwth-i6/returnn

tf

creatorscan/espnet

pytorch

Mentioned in GitHub

danoneata/espnet

pytorch

Mentioned in GitHub

dhanya-e/Google_Indic

pytorch

Mentioned in GitHub

vinitunni/CoupledLoss-LAS-ESPNet

pytorch

Mentioned in GitHub

marynader6/transformerTTS

pytorch

Mentioned in GitHub

rwth-i6/returnn-experiments

pytorch

Mentioned in GitHub

victor45664/espnet

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
speech-recognition-on-librispeech-test-clean	Seq-to-seq attention	Word Error Rate (WER): 3.82

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Improved training of end-to-end attention models for speech recognition | Papers | HyperAI