HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong Wei Ping Jiaji Huang Kexin Zhao Bryan Catanzaro

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Abstract

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

Code Repositories

albertfgu/diffwave-sashimi
pytorch
Mentioned in GitHub
revsic/tf-diffwave
tf
Mentioned in GitHub
neillu23/DiffuSE
pytorch
Mentioned in GitHub
neillu23/cdiffuse
pytorch
Mentioned in GitHub
philsyn/diffwave-vocoder
pytorch
Mentioned in GitHub
philsyn/diffwave-unconditional
pytorch
Mentioned in GitHub
lmnt-com/diffwave
pytorch
Mentioned in GitHub
revsic/jax-variational-diffwave
jax
Mentioned in GitHub
rf5/diffwave-unconditional
pytorch
Mentioned in GitHub
revsic/torch-diffusion-wavegan
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-ljspeechDiffWave LARGE
Mean Opinion Score: 4.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DiffWave: A Versatile Diffusion Model for Audio Synthesis | Papers | HyperAI