HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Jungil Kong Jaehyeon Kim Jaekyoung Bae

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Abstract

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart.

Code Repositories

rishikksh20/HiFi-GAN
pytorch
Mentioned in GitHub
TensorSpeech/TensorflowTTS
tf
Mentioned in GitHub
takaaki-saeki/ssl_speech_restoration
pytorch
Mentioned in GitHub
jik876/hifi-gan
Official
pytorch
Mentioned in GitHub
mindslab-ai/univnet
pytorch
Mentioned in GitHub
jaywalnut310/glow-tts
pytorch
Mentioned in GitHub
maum-ai/univnet
pytorch
Mentioned in GitHub
keonlee9420/Comprehensive-E2E-TTS
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-librittsHiFi-GAN
M-STFT: 1.0017
MCD: 0.6603
PESQ: 2.947
Periodicity: 0.1565
V/UV F1: 0.9300

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | Papers | HyperAI