HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

An efficient encoder-decoder architecture with top-down attention for speech separation

Kai Li Runxuan Yang Xiaolin Hu

An efficient encoder-decoder architecture with top-down attention for speech separation

Abstract

Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.

Code Repositories

JusperLee/TDANet
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-libri2mixTDANet
SI-SDRi: 16.9
speech-separation-on-libri2mixTDANet Large
SI-SDRi: 17.4
speech-separation-on-whamTDANet Large
SI-SDRi: 15.2
speech-separation-on-whamTDANet
SI-SDRi: 14.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
An efficient encoder-decoder architecture with top-down attention for speech separation | Papers | HyperAI