HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion

{Bing Yang Zhan Chen Hong Liu}

Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion

Abstract

Current studies have shown that extracting representative visual features and efficiently fusing audio and visual modalities are vital for audio-visual speech recognition (AVSR), but these are still challenging. To this end, we propose a lip graph assisted AVSR method with bidirectional synchronous fusion. First, a hybrid visual stream combines the image branch and graph branch to capture discriminative visual features. Specially, the lip graph exploits the natural and dynamic connections between the lip key points to model the lip shape, and the temporal evolution of the lip graph is captured by the graph convolutional networks followed by bidirectional gated recurrent units. Second, the hybrid visual stream is combined with the audio stream by an attention-based bidirectional synchronous fusion which allows bidirectional information interaction to resolve the asynchrony between the two modalities during fusion. The experimental results on LRW-BBC dataset show that our method outperforms the end-to-end AVSR baseline method in both clean and noisy conditions.

Benchmarks

BenchmarkMethodologyMetrics
landmark-based-lipreading-on-lrwLip Graph Assisted
Top 1 Accuracy: 49.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion | Papers | HyperAI