4 months ago

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement

Kühne Nikolai Lund ; Østergaard Jan ; Jensen Jesper ; Tan Zheng-Hua

Abstract

While attention-based architectures, such as Conformers, excel in speechenhancement, they face challenges such as scalability with respect to inputsequence length. In contrast, the recently proposed Extended Long Short-TermMemory (xLSTM) architecture offers linear scalability. However, xLSTM-basedmodels remain unexplored for speech enhancement. This paper introducesxLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. Acomparative analysis reveals that xLSTM-and notably, even LSTM-can match oroutperform state-of-the-art Mamba- and Conformer-based systems across variousmodel sizes in speech enhancement on the VoiceBank+Demand dataset. Throughablation studies, we identify key architectural design choices such asexponential gating and bidirectionality contributing to its effectiveness. Ourbest xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- andConformer-based systems of similar complexity on the Voicebank+DEMAND dataset.

Code Repositories

nikolaikyhne/xlstm-senet

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
speech-enhancement-on-demand	xLSTM-SENet2	CBAK: 3.98 COVL: 4.27 CSIG: 4.78 PESQ (wb): 3.53 Para. (M): 2.27 STOI: 0.96

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette