HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Speaker Diarization with LSTM

Quan Wang; Carlton Downey; Li Wan; Philip Andrew Mansfield; Ignacio Lopez Moreno

Speaker Diarization with LSTM

Abstract

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization. Specifically, we combine LSTM-based d-vector audio embeddings with recent work in non-parametric clustering to obtain a state-of-the-art speaker diarization system. Our system is evaluated on three standard public datasets, suggesting that d-vector based diarization systems offer significant advantages over traditional i-vector based systems. We achieved a 12.0% diarization error rate on NIST SRE 2000 CALLHOME, while our model is trained with out-of-domain data from voice search logs.

Code Repositories

vadimkantorov/convdia
pytorch
Mentioned in GitHub
pawel-rozwoda/lstm-diarization
pytorch
Mentioned in GitHub
muskang48/Speaker-Diarization
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speaker-diarization-on-callhome-109d-vector + spectral
DER(%): 12.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Speaker Diarization with LSTM | Papers | HyperAI