Command Palette
Search for a command to run...
Hyun Joon Park Byung Ha Kang Wooseok Shin Jin Sob Kim Sung Won Han

Abstract
In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-enhancement-on-demand | MANNER | CBAK: 3.65 COVL: 3.91 CSIG: 4.53 PESQ (wb): 3.21 STOI: 95 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.