HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Supervised online diarization with sample mean loss for multi-domain data

Enrico Fini Alessio Brutti

Supervised online diarization with sample mean loss for multi-domain data

Abstract

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. In particular, we introduce a novel loss function, we called Sample Mean Loss and we present a better modelling of the speaker turn behaviour, by devising an analytical expression to compute the probability of a new speaker joining the conversation. In addition, we demonstrate that our model can be trained on fixed-length speech segments, removing the need for speaker change information in inference. Using x-vectors as input features, we evaluate our proposed approach on the multi-domain dataset employed in the DIHARD II challenge: our online method improves with respect to the original UIS-RNN and achieves similar performance to an offline agglomerative clustering baseline using PLDA scoring.

Code Repositories

DonkeyShot21/uis-rnn-sml
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speaker-diarization-on-dihard-iiUIS-RNN-SML
DER - no overlap: 19.4
DER(%): 27.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Supervised online diarization with sample mean loss for multi-domain data | Papers | HyperAI