HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models

Loren Lugosch Brett Meyer Derek Nowrouzezahrai Mirco Ravanelli

Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models

Abstract

End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio without employing the standard pipeline composed of a separately trained speech recognizer and natural language understanding module. The downside of end-to-end SLU is that in-domain speech data must be recorded to train the model. In this paper, we propose a strategy for overcoming this requirement in which speech synthesis is used to generate a large synthetic training dataset from several artificial speakers. Experiments on two open-source SLU datasets confirm the effectiveness of our approach, both as a sole source of training data and as a form of data augmentation.

Code Repositories

lorenlugosch/pretrain_speech_model
pytorch
Mentioned in GitHub
dscripka/openwakeword
pytorch
Mentioned in GitHub
lorenlugosch/end-to-end-SLU
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
spoken-language-understanding-on-snipsReal + synthetic
Accuracy (%): 71.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models | Papers | HyperAI