Hervé BredinRuiqing YinJuan Manuel CoriaGregory GellyPavel KorshunovMarvin LavechinDiego FustesHadrien TiteuxWassim BouazizMarie-Philippe Gill

摘要
我们推出了 pyannote.audio,这是一个基于 Python 编写的开源工具包,专用于说话人分割(speaker diarization)。该工具包基于 PyTorch 机器学习框架,提供了一系列可训练的端到端神经模块,这些模块可灵活组合并联合优化,用于构建高效的说话人分割流水线。此外,pyannote.audio 配备了覆盖多种应用场景的预训练模型,涵盖语音活动检测(voice activity detection)、说话人切换检测(speaker change detection)、重叠语音检测(overlapped speech detection)以及说话人嵌入(speaker embedding)等任务,其在大多数任务上均达到了当前最优(state-of-the-art)的性能水平。
代码仓库
pyannote/pyannote-audio
官方
pytorch
muskang48/Speaker-Diarization
tf
GitHub 中提及
MarvinLvn/voice-type-classifier
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| speaker-diarization-on-ami | pyannote (MFCC) | DER(%): 6.3 FA: 3.5 Miss: 2.7 |
| speaker-diarization-on-ami | pyannote (waveform) | DER(%): 6.0 FA: 3.6 Miss: 2.4 |
| speaker-diarization-on-dihard-1 | pyannote (MFCC) | DER(%): 10.5 FA: 6.8 Miss: 3.7 |
| speaker-diarization-on-dihard-1 | Baseline (the best result in the literature as of Oct.2019) | DER(%): 11.2 FA: 6.5 Miss: 4.7 |
| speaker-diarization-on-dihard-1 | pyannote (waveform) | DER(%): 9.9 FA: 5.7 Miss: 4.2 |
| speaker-diarization-on-etape | Baseline | DER(%): 7.7 FA: 7.5 Miss: 0.2 |
| speaker-diarization-on-etape | pyannote (MFCC) | DER(%): 5.6 FA: 5.2 Miss: 0.4 |
| speaker-diarization-on-etape | pyannote (waveform) | DER(%): 4.9 FA: 4.2 Miss: 0.7 |