HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Lu Ye-Xin ; Ai Yang ; Ling Zhen-Hua

Explicit Estimation of Magnitude and Phase Spectra in Parallel for
  High-Quality Speech Enhancement

Abstract

Phase information has a significant impact on speech perceptual quality andintelligibility. However, existing speech enhancement methods encounterlimitations in explicit phase estimation due to the non-structural nature andwrapping characteristics of the phase, leading to a bottleneck in enhancedspeech quality. To overcome the above issue, in this paper, we proposedMP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitudeand Phase spectra in parallel. The proposed MP-SENet comprises aTransformer-embedded encoder-decoder architecture. The encoder aims to encodethe input distorted magnitude and phase spectra into time-frequencyrepresentations, which are further fed into time-frequency Transformers foralternatively capturing time and frequency dependencies. The decoder comprisesa magnitude mask decoder and a phase decoder, directly enhancing magnitude andwrapped phase spectra by incorporating a magnitude masking architecture and aphase parallel estimation architecture, respectively. Multi-level lossfunctions explicitly defined on the magnitude spectra, wrapped phase spectra,and short-time complex spectra are adopted to jointly train the MP-SENet model.A metric discriminator is further employed to compensate for the incompletecorrelation between these losses and human auditory perception. Experimentalresults demonstrate that our proposed MP-SENet achieves state-of-the-artperformance across multiple speech enhancement tasks, including speechdenoising, dereverberation, and bandwidth extension. Compared to existingphase-aware speech enhancement methods, it further mitigates the compensationeffect between the magnitude and phase by explicit phase estimation, elevatingthe perceptual quality of enhanced speech.

Code Repositories

yxlu-0102/MP-SENet
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-enhancement-on-deep-noise-suppressionMP-SENet
PESQ-NB: 3.92
PESQ-WB: 3.62
SI-SDR-WB: 21.03
speech-enhancement-on-demandMP-SENet
CBAK: 3.99
COVL: 4.34
CSIG: 4.81
PESQ (wb): 3.60
Para. (M): 2.26
STOI: 0.96

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement | Papers | HyperAI