HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

Yassine El Kheir; Shammur Absar Chowdhury; Ahmed Ali; Hamdy Mubarak; Shazia Afzal

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

Abstract

The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly interpolate raw speech signals while augmenting pronunciation. The masks facilitate smooth blending of the signals, generating more effective samples than the `Cut/Paste' method. Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [1]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. We also observed a 4.6% increase in F1-score with Arabic AraVoiceL2 testset.

Benchmarks

BenchmarkMethodologyMetrics
phone-level-pronunciation-scoring-onSpeechBlender + LSTM
Pearson correlation coefficient (PCC): 0.63

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation | Papers | HyperAI