
摘要
近年来,得益于新型神经网络架构与训练流程的发展,音乐源分离(Music Source Separation, MSS)模型的性能得到了显著提升。然而,当前大多数MSS模型的设计主要受到其他音频处理任务或研究领域的启发,尚未充分挖掘音乐信号本身的内在特性与规律。为此,本文提出一种频域模型——带宽分割循环神经网络(Band-Split RNN, BSRNN),该模型显式地将混合信号的谱图分割为多个子频带,并在子频带层级与序列层级之间进行交替建模。子频带的带宽可依据目标声源的先验知识或专家经验进行设定,从而针对特定类型乐器实现性能优化。为进一步利用未标注数据,本文还提出一种半监督微调(semi-supervised fine-tuning)流程,可进一步提升模型性能。实验结果表明,仅在MUSDB18-HQ数据集上训练的BSRNN,在2021年音乐混音分离(Music Demixing, MDX)挑战赛中显著优于多个顶尖模型,且半监督微调阶段在全部四类乐器音轨上均进一步提升了分离效果。
代码仓库
amanteur/BandSplitRNN-Pytorch
pytorch
GitHub 中提及
naba89/iseparate-sdx
pytorch
GitHub 中提及
crlandsc/music-demixing-with-band-split-rnn
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| music-source-separation-on-musdb18 | Band-Split RNN (semi-sup.) | SDR (avg): 8.97 SDR (bass): 8.16 SDR (drums): 10.15 SDR (other): 7.08 SDR (vocals): 10.47 |
| music-source-separation-on-musdb18 | Band-Split RNN | SDR (avg): 8.23 SDR (bass): 7.51 SDR (drums): 8.58 SDR (other): 6.62 SDR (vocals): 10.21 |
| music-source-separation-on-musdb18-hq | Band-Split RNN (semi-sup.) | SDR (avg): 8.97 SDR (bass): 8.16 SDR (drums): 10.15 SDR (others): 7.08 SDR (vocals): 10.47 |
| music-source-separation-on-musdb18-hq | Band-Split RNN | SDR (avg): 8.24 SDR (bass): 7.22 SDR (drums): 9.01 SDR (others): 6.70 SDR (vocals): 10.01 |