5 months ago

BUT System for the MLC-SLM Challenge

Alexander Polok Jiangyu Han Dominik Klement Samuele Cornell Jan \u010cernock\u00fd Luk\u00e1\u0161 Burget

Abstract

We present a two-speaker automatic speech recognition (ASR) system thatcombines DiCoW -- a diarization-conditioned variant of Whisper -- withDiariZen, a diarization pipeline built on top of Pyannote. We first evaluateboth systems in out-of-domain (OOD) multilingual scenarios without anyfine-tuning. In this scenario, DiariZen consistently outperforms the baselinePyannote diarization model, demonstrating strong generalization. Despite beingfine-tuned on English-only data for target-speaker ASR, DiCoW retains solidmultilingual performance, indicating that encoder modifications preserveWhisper's multilingual capabilities. We then fine-tune both DiCoW and DiariZenon the MLC-SLM challenge data. The fine-tuned DiariZen continues to outperformthe fine-tuned Pyannote baseline, while DiCoW sees further gains from domainadaptation. Our final system achieves a micro-average tcpWER/CER of 16.75% andranks second in Task 2 of the MLC-SLM challenge. Lastly, we identify severallabeling inconsistencies in the training data -- such as missing speechsegments and incorrect silence annotations -- which can hinder diarizationfine-tuning. We propose simple mitigation strategies to address these issuesand improve system robustness.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

BUT System for the MLC-SLM Challenge

Alexander Polok Jiangyu Han Dominik Klement Samuele Cornell Jan \u010cernock\u00fd Luk\u00e1\u0161 Burget

Abstract

Build AI with AI

Hyper Newsletters