HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park; William Chan; Yu Zhang; Chung-Cheng Chiu; Barret Zoph; Ekin D. Cubuk; Quoc V. Le

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Abstract

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

Code Repositories

shuaijiang/Whisper-Finetune
pytorch
Mentioned in GitHub
mozilla/DeepSpeech
tf
Mentioned in GitHub
lRomul/argus-freesound
pytorch
Mentioned in GitHub
andychinka/dcase-challenge
pytorch
Mentioned in GitHub
freds0/data_augmentation_for_asr
pytorch
Mentioned in GitHub
MichaelisTrofficus/spec_augment
tf
Mentioned in GitHub
HeleneFabia/keyword-spotter
pytorch
Mentioned in GitHub
viig99/mixmatch-freesound
pytorch
Mentioned in GitHub
cosmoquester/speech-recognition
tf
Mentioned in GitHub
biyoml/End-to-End-Mandarin-ASR
pytorch
Mentioned in GitHub
DemisEom/SpecAugment
pytorch
Mentioned in GitHub
shelling203/SpecAugment
pytorch
Mentioned in GitHub
ZhengkunTian/OpenTransformer
pytorch
Mentioned in GitHub
google-research/leaf-audio
tf
Mentioned in GitHub
kimjeongsun/specaugment
pytorch
Mentioned in GitHub
jackjhliu/End-to-End-Mandarin-ASR
pytorch
Mentioned in GitHub
hgstudent/las
tf
Mentioned in GitHub
park-cheol/ASR-Conformer
pytorch
Mentioned in GitHub
HLasse/wav2vec_finetune
pytorch
Mentioned in GitHub
SarthakYadav/audax
jax
Mentioned in GitHub
iver56/audiomentations
pytorch
Mentioned in GitHub
audio-westlakeu/rct
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-hub500-switchboardLAS + SpecAugment (with LM, Switchboard mild policy)
CallHome: 14.6
SwitchBoard: 6.8
speech-recognition-on-hub500-switchboardLAS + SpecAugment (with LM, Switchboard strong policy)
CallHome: 14
SwitchBoard: 7.1
speech-recognition-on-librispeech-test-cleanLAS (no LM)
Word Error Rate (WER): 2.7
speech-recognition-on-librispeech-test-cleanLAS + SpecAugment
Word Error Rate (WER): 2.5
speech-recognition-on-librispeech-test-otherLAS + SpecAugment
Word Error Rate (WER): 5.8
speech-recognition-on-librispeech-test-otherLAS (no LM)
Word Error Rate (WER): 6.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition | Papers | HyperAI