HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

OverFlow: Putting flows on top of neural transducers for better TTS

Shivam Mehta; Ambika Kirkland; Harm Lameris; Jonas Beskow; Éva Székely; Gustav Eje Henter

OverFlow: Putting flows on top of neural transducers for better TTS

Abstract

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech. They combine the best features of classic statistical speech synthesis and modern neural TTS, requiring less data and fewer training updates, and are less prone to gibberish output caused by neural attention failures. In this paper, we combine neural HMM TTS with normalising flows for describing the highly non-Gaussian distribution of speech acoustics. The result is a powerful, fully probabilistic model of durations and acoustics that can be trained using exact maximum likelihood. Experiments show that a system based on our proposal needs fewer updates than comparable methods to produce accurate pronunciations and a subjective speech quality close to natural speech. Please see https://shivammehta25.github.io/OverFlow/ for audio examples and code.

Code Repositories

shivammehta25/OverFlow
Official
pytorch
coqui-ai/TTS
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-speech-synthesis-on-ljspeechOverFlow
Audio Quality MOS: 3.37
Word Error Rate (WER): 2.30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OverFlow: Putting flows on top of neural transducers for better TTS | Papers | HyperAI