HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Rostislav Kolobov Olga Okhapkina Olga Omelchishina Andrey Platunov Roman Bedyakin Vyacheslav Moshkin Dmitry Menshikov Nikolay Mikhaylovskiy

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Abstract

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

Code Repositories

NTRLab/MediaSpeech
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-mediaspeechWit
WER for Arabic: 0.2333
WER for French: 0.1759
WER for Spanish: 0.0879
WER for Turkish: 0.0768
speech-recognition-on-mediaspeechSilero
WER for Spanish: 0.3070
speech-recognition-on-mediaspeechQuartznet
WER for Arabic: 0.1300
WER for French: 0.1915
WER for Spanish: 0.1826
WER for Turkish: 0.1422
speech-recognition-on-mediaspeechAzure
WER for Arabic: 0.3016
WER for French: 0.1683
WER for Spanish: 0.1296
WER for Turkish: 0.2296
speech-recognition-on-mediaspeechVOSK
WER for Arabic: 0.3085
WER for French: 0.2111
WER for Spanish: 0.1970
WER for Turkish: 0.3050
speech-recognition-on-mediaspeechGoogle
WER for Arabic: 0.4464
WER for French: 0.2385
WER for Spanish: 0.2176
WER for Turkish: 0.2707
speech-recognition-on-mediaspeechDeepspeech
WER for French: 0.4741
WER for Spanish: 0.4236
speech-recognition-on-mediaspeechwav2vec
WER for Arabic: 0.9596
WER for French: 0.3113
WER for Spanish: 0.2469
WER for Turkish: 0.5812

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MediaSpeech: Multilanguage ASR Benchmark and Dataset | Papers | HyperAI