HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Scribosermo: Fast Speech-to-Text models for German and other Languages

Daniel Bermuth Alexander Poeppel Wolfgang Reif

Scribosermo: Fast Speech-to-Text models for German and other Languages

Abstract

Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-common-voice-frenchQuartzNet15x5FR (CV-only)
Test WER: 12.1%
speech-recognition-on-common-voice-frenchConformerCTC-L (5-gram)
Test WER: 8.13%
speech-recognition-on-common-voice-frenchConformerCTC-L (no-LM)
Test WER: 10.19 %
speech-recognition-on-common-voice-frenchQuartzNet15x5FR (D7)
Test WER: 11.0%
speech-recognition-on-common-voice-germanQuartzNet15x5DE (D37, 5-gram)
Test CER: 2.7%
Test WER: 6.6%
speech-recognition-on-common-voice-germanConformerCTC-L (5-gram)
Test CER: 1.37%
Test WER: 4.05%
speech-recognition-on-common-voice-germanQuartzNet15x5DE (CV-only, 5-gram)
Test CER: 3.2%
Test WER: 7.7%
speech-recognition-on-common-voice-germanConformerCTC-L (no LM)
Test CER: 2.05%
Test WER: 7.33%
speech-recognition-on-common-voice-italianQuartzNet15x5IT (D5)
Test WER: 11.5%
speech-recognition-on-common-voice-spanishQuartzNet15x5ES (CV-only)
Test WER: 10.5%
speech-recognition-on-common-voice-spanishConformerCTC-L (5-gram)
Test WER: 5.68%
speech-recognition-on-common-voice-spanishConformerCTC-L (no-LM)
Test WER: 7.46 %
speech-recognition-on-common-voice-spanishQuartzNet15x5ES (D8)
Test WER: 10.0%
speech-recognition-on-tudaQuartzNet15x5DE (D37)
Test WER: 10.2%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scribosermo: Fast Speech-to-Text models for German and other Languages | Papers | HyperAI