Command Palette
Search for a command to run...
Improving the previous state-of-the-art Frisian ASR by fine-tuning XLS-R
{Golshid Shekoufandeh Dragoș Alexandru Bălan}
Abstract
Automatic Speech Recognition (ASR), a system that converts human speech to text, has a major role in digitizing human communication. Despite their significance, most of these systems are designed for higher-resourced languages, like English, Mandarin, or Spanish, leaving lower-resourced languages, such as Frisian, underrepresented. To address this issue, our paper introduces a fine-tuned ASR model based on the Wav2Vec 2.0 XLS-R architecture, trained on the Common Voice corpus version 12.0, to transcribe Frisian speech. With a learning rate of 8e-5, our proposed ASR system has achieved a 15.99% word error rate (WER), surpassing the previous state-of-the-art of 16.25% and serving as a benchmark for future research in this field.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-recognition-on-common-voice-frisian | wav2vec2-large-xls-r-1b-frisian | Test WER: 15.99% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.