8 months ago

Abstract

Research in auditory, visual, and audiovisual speech recognition (ASR, VSR,and AVSR, respectively) has traditionally been conducted independently. Evenrecent self-supervised studies addressing two or all three tasks simultaneouslytend to yield separate models, leading to disjoint inference pipelines withincreased memory requirements and redundancies. This paper proposes unifiedtraining strategies for these systems. We demonstrate that training a singlemodel for all three tasks enhances VSR and AVSR performance, overcoming typicaloptimisation challenges when training from scratch. Moreover, we introduce agreedy pseudo-labelling approach to more effectively leverage unlabelledsamples, addressing shortcomings in related self-supervised methods. Finally,we develop a self-supervised pre-training method within our framework, provingits effectiveness alongside our semi-supervised approach. Despite using asingle model for all tasks, our unified approach achieves state-of-the-artperformance compared to recent methods on LRS3 and LRS2 for ASR, VSR, and AVSR,as well as on the newly released WildVSR dataset. Code and models are availableat https://github.com/ahaliassos/usr.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Alexandros Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis Maja Pantic

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Alexandros Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis Maja Pantic

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs | Papers | HyperAI

Command Palette

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

Alexandros Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis Maja Pantic

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

Alexandros Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis Maja Pantic

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

Alexandros Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis Maja Pantic

Abstract

Build AI with AI

HyperAI Newsletters