HyperAIHyperAI

Command Palette

Search for a command to run...

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Rostislav Kolobov Olga Okhapkina Olga Omelchishina Andrey Platunov Roman Bedyakin Vyacheslav Moshkin Dmitry Menshikov Nikolay Mikhaylovskiy

Abstract

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MediaSpeech: Multilanguage ASR Benchmark and Dataset | Papers | HyperAI