HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Jonathan Shen; Ruoming Pang; Ron J. Weiss; Mike Schuster; Navdeep Jaitly; Zongheng Yang; Zhifeng Chen; Yu Zhang; Yuxuan Wang; RJ Skerry-Ryan; Rif A. Saurous; Yannis Agiomyrgiannakis; Yonghui Wu

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Abstract

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of $4.53$ comparable to a MOS of $4.58$ for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and $F_0$ features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.

Code Repositories

xinshengwang/Tacotron-pytorch
pytorch
Mentioned in GitHub
dipjyoti92/SC-WaveRNN
pytorch
Mentioned in GitHub
anandaswarup/rnn-tts
pytorch
Mentioned in GitHub
TensorSpeech/TensorflowTTS
tf
Mentioned in GitHub
OlaWod/my-tacotron2
pytorch
Mentioned in GitHub
thepowerfuldeez/tacotron2
pytorch
Mentioned in GitHub
BogiHsu/Tacotron2-PyTorch
pytorch
Mentioned in GitHub
keonlee9420/Comprehensive-Tacotron2
pytorch
Mentioned in GitHub
kaiidams/voice100-tts
pytorch
Mentioned in GitHub
xcmyz/FastSpeech
pytorch
Mentioned in GitHub
dathudeptrai/TensorflowTTS
tf
Mentioned in GitHub
vincenzo-scotti/tacotron2
pytorch
Mentioned in GitHub
s3nh/pytorch-tacotron2
pytorch
Mentioned in GitHub
anandaswarup/TTS
pytorch
Mentioned in GitHub
bfs18/tacotron2
pytorch
Mentioned in GitHub
izzajalandoni/tts_models
pytorch
Mentioned in GitHub
martinlenglet/avtacotron2
pytorch
Mentioned in GitHub
rosinality/melgan-pytorch
pytorch
Mentioned in GitHub
Rayhane-mamah/Tacotron-2
tf
Mentioned in GitHub
coqui-ai/TTS
pytorch
Mentioned in GitHub
creotiv/RussianTTS-Tacotron2
pytorch
Mentioned in GitHub
dipjyoti92/TTS-Style-Transfer
pytorch
Mentioned in GitHub
Jeevesh8/Cross-Lingual-Voice-Cloning
pytorch
Mentioned in GitHub
NVIDIA/tacotron2
pytorch
Mentioned in GitHub
kaiidams/voice100
pytorch
Mentioned in GitHub
alpharol/Taco_Collection
tf
Mentioned in GitHub
thuhcsi/tacotron
pytorch
Mentioned in GitHub
choiHkk/Transformer-TTS
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-north-american-englishTacotron 2
Mean Opinion Score: 4.526
speech-synthesis-on-north-american-englishWaveNet (Linguistic)
Mean Opinion Score: 4.341

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions | Papers | HyperAI