HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Max W. Y. Lam Jun Wang Dan Su Dong Yu

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Abstract

Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate. We also find that BDDM allows inheriting pre-trained score network parameters from any DPMs and consequently enables speedy and stable learning of the schedule network and optimization of a noise schedule for sampling. Our experiments demonstrate that BDDMs can generate high-fidelity audio samples with as few as three sampling steps. Moreover, compared to other state-of-the-art diffusion-based neural vocoders, BDDMs produce comparable or higher quality samples indistinguishable from human speech, notably with only seven sampling steps (143x faster than WaveGrad and 28.6x faster than DiffWave). We release our code at https://github.com/tencent-ailab/bddm.

Code Repositories

tencent-ailab/bddm
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-ljspeechBDDM vocoder
Mean Opinion Score: 4.48

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis | Papers | HyperAI