6 months ago

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao

Abstract

This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Audio and Speech Processing

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Audio and Speech Processing

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao1 more

Abstract

Build AI with AI

HyperAI Newsletters

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao

Zhifu Gao Zerui Li Jiaming Wang Haoneng Luo Xian Shi Mengzhe Chen Yabin Li Lingyun Zuo Zhihao Du Zhangyu Xiao