Command Palette
Search for a command to run...
Kühne Nikolai Lund ; Østergaard Jan ; Jensen Jesper ; Tan Zheng-Hua

Abstract
While attention-based architectures, such as Conformers, excel in speechenhancement, they face challenges such as scalability with respect to inputsequence length. In contrast, the recently proposed Extended Long Short-TermMemory (xLSTM) architecture offers linear scalability. However, xLSTM-basedmodels remain unexplored for speech enhancement. This paper introducesxLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. Acomparative analysis reveals that xLSTM-and notably, even LSTM-can match oroutperform state-of-the-art Mamba- and Conformer-based systems across variousmodel sizes in speech enhancement on the VoiceBank+Demand dataset. Throughablation studies, we identify key architectural design choices such asexponential gating and bidirectionality contributing to its effectiveness. Ourbest xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- andConformer-based systems of similar complexity on the Voicebank+DEMAND dataset.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-enhancement-on-demand | xLSTM-SENet2 | CBAK: 3.98 COVL: 4.27 CSIG: 4.78 PESQ (wb): 3.53 Para. (M): 2.27 STOI: 0.96 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.