HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Vietnamese end-to-end speech recognition using wav2vec 2.0

{Thai Binh Nguyen}

Abstract

Our models are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of VLSP ASR dataset on 16kHz sampled speech audio. We use wav2vec2 architecture for the pre-trained model. For fine-tuning phase, wav2vec2 is fine-tuned using Connectionist Temporal Classification (CTC), which is an algorithm that is used to train neural networks for sequence-to-sequence problems and mainly in Automatic Speech Recognition and handwriting recognition. On the Vivos dataset, we achieved a WER score of 6.15

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-common-voice-viVietnamese end-to-end speech recognition using wav2vec 2.0 by VietAI
Test WER: 11.52
speech-recognition-on-vivosVietnamese end-to-end speech recognition using wav2vec 2.0 by VietAI
Test WER: 6.15

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Vietnamese end-to-end speech recognition using wav2vec 2.0 | Papers | HyperAI