6 months ago

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts. This marks a step towards a fresh paradigm in generative error correction within the realm of n-best hypotheses. Unlike the existing ranking-based rescoring methods, our approach adeptly uses distinct initialization techniques and parameter-efficient algorithms to boost ASR performance derived from pre-trained speech and text models. Through evaluation across diverse ASR datasets, we evaluate the stability and reproducibility of our fusion technique, demonstrating its improved word error rate relative (WERR) performance in comparison to n-best hypotheses by relatively 37.66%. To encourage future research, we have made our code and pre-trained models open source at https://github.com/Srijith-rkr/Whispering-LLaMA.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Audio and Speech Processing

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Audio and Speech Processing

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Srijith Radhakrishnan Chao-Han Huck Yang Sumeer Ahmad Khan Rohit Kumar Narsis A. Kiani David Gomez-Cabrero Jesper N. Tegner

Abstract

Build AI with AI

HyperAI Newsletters