HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Schrödinger Bridge for Generative Speech Enhancement

Ante Jukić; Roman Korostik; Jagadeesh Balam; Boris Ginsburg

Schrödinger Bridge for Generative Speech Enhancement

Abstract

This paper proposes a generative speech enhancement model based on Schrödinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distribution. The model is trained with a data prediction loss, aiming to recover the complex-valued clean speech coefficients, and an auxiliary time-domain loss is used to improve training of the model. The effectiveness of the proposed SB-based model is evaluated in two different speech enhancement tasks: speech denoising and speech dereverberation. The experimental results demonstrate that the proposed SB-based outperforms diffusion-based models in terms of speech quality metrics and ASR performance, e.g., resulting in relative word error rate reduction of 20% for denoising and 6% for dereverberation compared to the best baseline model. The proposed model also demonstrates improved efficiency, achieving better quality than the baselines for the same number of sampling steps and with a reduced computational cost.

Benchmarks

BenchmarkMethodologyMetrics
speech-enhancement-on-ears-whamSchrödinger Bridge
DNSMOS: 3.83
ESTOI: 0.73
PESQ-WB: 2.33
POLQA: 3.46
SI-SDR: 17.85
SIGMOS: 3.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Schrödinger Bridge for Generative Speech Enhancement | Papers | HyperAI