3 months ago

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani Yossi Adi Lior Wolf

Abstract

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Code Repositories

enk100/speaker_separation

Mack189/gdprnn

mindspore

Mentioned in GitHub

muhammad-ahmed-ghani/svoice_demo

pytorch

Mentioned in GitHub

facebookresearch/svoice

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
speech-separation-on-whamr	VSUNOS	SI-SDRi: 12.2
speech-separation-on-wsj0-2mix	Gated DualPathRNN	SI-SDRi: 20.12
speech-separation-on-wsj0-3mix	Gated DualPathRNN	SI-SDRi: 16.85
speech-separation-on-wsj0-4mix	Gated DualPathRNN	SI-SDRi: 12.88
speech-separation-on-wsj0-5mix	Gated DualPathRNN	SI-SDRi: 10.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani Yossi Adi Lior Wolf

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters