HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Another Point of View on Visual Speech Recognition

{Frederic Precioso Charles Bouveyron Giacomo Valenti Laurent Pilati Baptiste Pouthier}

Another Point of View on Visual Speech Recognition

Abstract

Standard Visual Speech Recognition (VSR) systems directly process images as input features without any apriori link between raw pixel data and facial traits. Pixel information is smartly sieved when facial landmarks are extracted from pictures and repurposed as graph nodes. Their evolution through time is thus modeled by a Graph Convolutional Network. However, with graph-based VSR being in its infancy, the selection of points and their correlation are still ill-defined and often bound to aprioristic knowledge and handcrafted techniques. In this paper, we investigate the graph approach for VSR and its ability to learn the correlation between points beyond the mouth region. We also study the different contributions that each facial region brings to the system accuracy, proving that more scattered but better connected graphs can be both computationally light and accurate.

Benchmarks

BenchmarkMethodologyMetrics
landmark-based-lipreading-on-lrwAnother Point of View
Top 1 Accuracy: 62.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Another Point of View on Visual Speech Recognition | Papers | HyperAI