HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Wavesplit: End-to-End Speech Separation by Speaker Clustering

Neil Zeghidour David Grangier

Wavesplit: End-to-End Speech Separation by Speaker Clustering

Abstract

We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation. For speech separation, our sequence-wide speaker representations provide a more robust separation of long, challenging recordings compared to prior work. Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2/3mix), as well as in noisy and reverberated settings (WHAM/WHAMR). We also set a new benchmark on the recent LibriMix dataset. Finally, we show that Wavesplit is also applicable to other domains, by separating fetal and maternal heart rates from a single abdominal electrocardiogram.

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-whamrWavesplit
SI-SDRi: 13.2
speech-separation-on-wsj0-2mixWavesplit v2
SDRi: 22.3
SI-SDRi: 22.2
speech-separation-on-wsj0-2mixWavesplit v1
SI-SDRi: 19.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Wavesplit: End-to-End Speech Separation by Speaker Clustering | Papers | HyperAI