HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Dongheon Lee; Jung-Woo Choi

DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Abstract

In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
speech-dereverberation-on-spatialized-wsjcam0DeFT-AN
PESQ: 3.63
SI-SDR: 15.7
STOI: 0.981
speech-enhancement-on-spatialized-dnsDeFT-AN
PESQ: 3.01
SI-SDR: 9.9
STOI: 0.924

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement | Papers | HyperAI