Command Palette
Search for a command to run...
Singing Voice Separation with Deep U-Net Convolutional Networks
{Tillman Weyde Aparna Kumar Rachel Bittner Nicola Montecchio Eric Humphrey Andreas Jansson}
Abstract
The decomposition of a music audio signal into its vocal and backing track components is analogous to image-toimage translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-separation-on-ikala | U-Net | NSDR: 11.094 (Vocal); 14.435 (Instrumental) |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.