HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Timo Lohrenz Patrick Schwarz Zhengyang Li Tim Fingscheidt

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Abstract

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external language models. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter.

Code Repositories

freewym/espresso
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-librispeech-test-otherConformer with Relaxed Attention
Word Error Rate (WER): 6.85
speech-recognition-on-wsj-eval92Transformer with Relaxed Attention
Word Error Rate (WER): 3.19

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition | Papers | HyperAI