HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

William Ravenscroft; Stefan Goetze; Thomas Hain

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

Abstract

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signalto-distortion ratio (SISDR) improvement over the input signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M parameters is proposed which gives comparable separation performance to larger and more computationally complex models.

Code Repositories

jwr1995/dtcn
Official
pytorch
Mentioned in GitHub
jwr1995/pubsep
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-whamrDeformable TCN + Dynamic Mixing
MACs (G): 3.7
Number of parameters (M): 3.6
SDRi: 10.3
SI-SDRi: 11.1
speech-separation-on-whamrDeformable TCN + Shared Weights + Dynamic Mixing
MACs (G): 3.7
Number of parameters (M): 1.3
SDRi: 9.5
SI-SDRi: 10.1
speech-separation-on-wsj0-2mixDeformable TCN + Dynamic Mixing
MACs (G): 3.7
Number of parameters (M): 3.6
SDRi: 17.4
SI-SDRi: 17.2
speech-separation-on-wsj0-2mixDeformable TCN + Shared Weights + Dynamic Mixing
MACs (G): 3.7
Number of parameters (M): 1.3
SDRi: 16.3
SI-SDRi: 16.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation | Papers | HyperAI