HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Compute and memory efficient universal sound source separation

Efthymios Tzinis Zhepei Wang Xilin Jiang Paris Smaragdis

Compute and memory efficient universal sound source separation

Abstract

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-world scenarios. The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions. This mechanism enables our models to obtain high fidelity signal separation in a wide variety of settings where variable number of sources are present and with limited computational resources (e.g. floating point operations, memory footprint, number of parameters and latency). Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks with significantly higher computational resource requirements. The causal variation of SuDoRM-RF is able to obtain competitive performance in real-time speech separation of around 10dB scale-invariant signal-to-distortion ratio improvement (SI-SDRi) while remaining up to 20 times faster than real-time on a laptop device.

Code Repositories

etzinis/sudo_rm_rf
Official
pytorch
Mentioned in GitHub
udase-chime2023/baseline
pytorch
Mentioned in GitHub
etzinis/unsup_speech_enh_adaptation
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-whamrImproved Sudo rm -rf (U=36)
SI-SDRi: 13.5
speech-separation-on-wsj0-2mixSudo rm -rf (U=36)
SI-SDRi: 19.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Compute and memory efficient universal sound source separation | Papers | HyperAI