Command Palette
Search for a command to run...
Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
Sung-Feng Huang Shun-Po Chuang Da-Rong Liu Yi-Chen Chen Gene-Ping Yang Hung-yi Lee

Abstract
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired. In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model. Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-separation-on-libri2mix | Conv-Tasnet (Libri1Mix speech enhancement pre-trained) | SDRi: 14.6 SI-SDRi: 14.1 |
| speech-separation-on-libri2mix | Conv-Tasnet (Libri1Mix speech enhancement multi-task) | SDRi: 14.1 SI-SDRi: 13.7 |
| speech-separation-on-libri2mix | Conv-Tasnet | SDRi: 13.6 SI-SDRi: 13.2 |
| speech-separation-on-wsj0-2mix | DPTNet (Libri1Mix speech enhancement pre-trained) | SDRi: 21.5 SI-SDRi: 21.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.