HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Fine-tuning wav2vec2 for speaker recognition

Nik Vaessen David A. van Leeuwen

Fine-tuning wav2vec2 for speaker recognition

Abstract

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at https://github.com/nikvaessen/w2v2-speaker.

Benchmarks

BenchmarkMethodologyMetrics
speaker-recognition-on-voxceleb1w2v2-aam
EER: 1.88

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Fine-tuning wav2vec2 for speaker recognition | Papers | HyperAI