HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Abdulatif Sherif ; Cao Ruizhe ; Yang Bin

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Abstract

In this work, we further develop the conformer-based metric generativeadversarial network (CMGAN) model for speech enhancement (SE) in thetime-frequency (TF) domain. This paper builds on our previous work but takes amore in-depth look by conducting extensive ablation studies on model inputs andarchitectural design choices. We rigorously tested the generalization abilityof the model to unseen noise types and distortions. We have fortified ourclaims through DNS-MOS measurements and listening tests. Rather than focusingexclusively on the speech denoising task, we extend this work to address thedereverberation and super-resolution tasks. This necessitated exploring variousarchitectural changes, specifically metric discriminator scores and maskingtechniques. It is essential to highlight that this is among the earliest worksthat attempted complex TF-domain super-resolution. Our findings show that CMGANoutperforms existing state-of-the-art methods in the three major speechenhancement tasks: denoising, dereverberation, and super-resolution. Forexample, in the denoising task using the Voice Bank+DEMAND dataset, CMGANnotably exceeded the performance of prior models, attaining a PESQ score of3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations areavailable online.

Code Repositories

ruizhecao96/cmgan
Official
pytorch
Mentioned in GitHub
SherifAbdulatif/CMGAN
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-super-resolution-on-vctk-multi-speaker-1CMGAN
Log-Spectral Distance: 0.76
speech-enhancement-on-demandCMGAN
CBAK: 3.94
COVL: 4.12
CSIG: 4.63
PESQ (wb): 3.41
SSNR: 11.1
STOI: 96

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement | Papers | HyperAI