HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Peng Liu Dongyang Dai Zhiyong Wu

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Abstract

Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete acoustic tokens. RFWave uniquely generates complex spectrograms and operates at the frame level, processing all subbands simultaneously to boost efficiency. Leveraging Rectified Flow, which targets a straight transport trajectory, RFWave achieves reconstruction with just 10 sampling steps. Our empirical evaluations show that RFWave not only provides outstanding reconstruction quality but also offers vastly superior computational efficiency, enabling audio generation at speeds up to 160 times faster than real-time on a GPU. An online demonstration is available at: https://rfwave-demo.github.io/rfwave/.

Code Repositories

bfs18/rfwave
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-synthesis-on-librittsRFWave
PESQ: 4.228
Periodicity: 0.090
V/UV F1: 0.968

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction | Papers | HyperAI