HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

WaveFlow: A Compact Flow-based Model for Raw Audio

Wei Ping Kainan Peng Kexin Zhao Zhao Song

WaveFlow: A Compact Flow-based Model for Raw Audio

Abstract

In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15$\times$ smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio 42.6$\times$ faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.

Code Repositories

L0SG/NanoFlow
pytorch
Mentioned in GitHub
caillonantoine/waveflow
pytorch
Mentioned in GitHub
PaddlePaddle/Parakeet
Official
paddle
Mentioned in GitHub
L0SG/WaveFlow
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-librittsWaveFlow
M-STFT: 1.1120
MCD: 1.2455
PESQ: 3.027
Periodicity: 0.1416
V/UV F1: 0.9410

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
WaveFlow: A Compact Flow-based Model for Raw Audio | Papers | HyperAI