HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Abstract

This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-aishell-1Paraformer
Params(M): 46.3
Word Error Rate (WER): 4.95
speech-recognition-on-aishell-1Paraformer-large
Params(M): 220
Word Error Rate (WER): 1.95
speech-recognition-on-aishell-2Paraformer
Word Error Rate (WER): 5.73
speech-recognition-on-aishell-2Paraformer-large
Word Error Rate (WER): 2.85
speech-recognition-on-wenetspeechParaformer-large
Character Error Rate (CER): 6.97

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FunASR: A Fundamental End-to-End Speech Recognition Toolkit | Papers | HyperAI