Command Palette
Search for a command to run...
Rostislav Kolobov Olga Okhapkina Olga Omelchishina Andrey Platunov Roman Bedyakin Vyacheslav Moshkin Dmitry Menshikov Nikolay Mikhaylovskiy

Abstract
The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| speech-recognition-on-mediaspeech | Wit | WER for Arabic: 0.2333 WER for French: 0.1759 WER for Spanish: 0.0879 WER for Turkish: 0.0768 |
| speech-recognition-on-mediaspeech | Silero | WER for Spanish: 0.3070 |
| speech-recognition-on-mediaspeech | Quartznet | WER for Arabic: 0.1300 WER for French: 0.1915 WER for Spanish: 0.1826 WER for Turkish: 0.1422 |
| speech-recognition-on-mediaspeech | Azure | WER for Arabic: 0.3016 WER for French: 0.1683 WER for Spanish: 0.1296 WER for Turkish: 0.2296 |
| speech-recognition-on-mediaspeech | VOSK | WER for Arabic: 0.3085 WER for French: 0.2111 WER for Spanish: 0.1970 WER for Turkish: 0.3050 |
| speech-recognition-on-mediaspeech | WER for Arabic: 0.4464 WER for French: 0.2385 WER for Spanish: 0.2176 WER for Turkish: 0.2707 | |
| speech-recognition-on-mediaspeech | Deepspeech | WER for French: 0.4741 WER for Spanish: 0.4236 |
| speech-recognition-on-mediaspeech | wav2vec | WER for Arabic: 0.9596 WER for French: 0.3113 WER for Spanish: 0.2469 WER for Turkish: 0.5812 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.