HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Jee-weon Jung Wangyou Zhang Jiatong Shi Zakaria Aldeneh Takuya Higuchi Barry-John Theobald Ahmed Hussen Abdelaziz Shinji Watanabe

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Abstract

This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We provide several models, ranging from x-vector to recent SKA-TDNN. Through the modularized architecture design, variants can be developed easily. We also aspire to bridge developed models with other domains, facilitating the broad research community to effortlessly incorporate state-of-the-art embedding extractors. Pre-trained embedding extractors can be accessed in an off-the-shelf manner and we demonstrate the toolkit's versatility by showcasing its integration with two tasks. Another goal is to integrate with diverse self-supervised learning features. We release a reproducible recipe that achieves an equal error rate of 0.39% on the Vox1-O evaluation protocol using WavLM-Large with ECAPA-TDNN.

Code Repositories

espnet/espnet
Official
pytorch
Jungjee/RawNet
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speaker-recognition-on-voxceleb1WavLM+ECAPA-TDNN
EER: 0.39
speaker-verification-on-voxcelebWavLM+ECAPA-TDNN
EER: 0.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models | Papers | HyperAI