HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation

Abstract

Recently, universal waveform generation tasks have been investigatedconditioned on various out-of-distribution scenarios. Although GAN-basedmethods have shown their strength in fast waveform generation, they arevulnerable to train-inference mismatch scenarios such as two-stagetext-to-speech. Meanwhile, diffusion-based models have shown their powerfulgenerative performance in other domains; however, they stay out of thelimelight due to slow inference speed in waveform generation tasks. Above all,there is no generator architecture that can explicitly disentangle the naturalperiodic features of high-resolution waveform signals. In this paper, wepropose PeriodWave, a novel universal waveform generation model. First, weintroduce a period-aware flow matching estimator that can capture the periodicfeatures of the waveform signal when estimating the vector fields.Additionally, we utilize a multi-period estimator that avoids overlaps tocapture different periodic features of waveform signals. Although increasingthe number of periods can improve the performance significantly, this requiresmore computational costs. To reduce this issue, we also propose a singleperiod-conditional universal estimator that can feed-forward parallel byperiod-wise batch inference. Additionally, we utilize discrete wavelettransform to losslessly disentangle the frequency information of waveformsignals for high-frequency modeling, and introduce FreeU to reduce thehigh-frequency noise for waveform generation. The experimental resultsdemonstrated that our model outperforms the previous models both inMel-spectrogram reconstruction and text-to-speech tasks. All source code willbe available at https://github.com/sh-lee-prml/PeriodWave.

Code Repositories

sh-lee-prml/periodwave
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-librittsPeriodWave + FreeU
M-STFT: 1.0269
PESQ: 4.248
Periodicity: 0.0765
V/UV F1: 0.9651

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation | Papers | HyperAI