7 months ago

Computer Vision

Convolutional Neural Network

Video Understanding

Method/Architecture

Computer Vision

Frederic Precioso Charles Bouveyron Giacomo Valenti Laurent Pilati Baptiste Pouthier

Abstract

Standard Visual Speech Recognition (VSR) systems directly process images as input features without any apriori link between raw pixel data and facial traits. Pixel information is smartly sieved when facial landmarks are extracted from pictures and repurposed as graph nodes. Their evolution through time is thus modeled by a Graph Convolutional Network. However, with graph-based VSR being in its infancy, the selection of points and their correlation are still ill-defined and often bound to aprioristic knowledge and handcrafted techniques. In this paper, we investigate the graph approach for VSR and its ability to learn the correlation between points beyond the mouth region. We also study the different contributions that each facial region brings to the system accuracy, proving that more scattered but better connected graphs can be both computationally light and accurate.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

7 months ago

Computer Vision

Convolutional Neural Network

Video Understanding

Method/Architecture

Computer Vision

Frederic Precioso Charles Bouveyron Giacomo Valenti Laurent Pilati Baptiste Pouthier

Abstract

Standard Visual Speech Recognition (VSR) systems directly process images as input features without any apriori link between raw pixel data and facial traits. Pixel information is smartly sieved when facial landmarks are extracted from pictures and repurposed as graph nodes. Their evolution through time is thus modeled by a Graph Convolutional Network. However, with graph-based VSR being in its infancy, the selection of points and their correlation are still ill-defined and often bound to aprioristic knowledge and handcrafted techniques. In this paper, we investigate the graph approach for VSR and its ability to learn the correlation between points beyond the mouth region. We also study the different contributions that each facial region brings to the system accuracy, proving that more scattered but better connected graphs can be both computationally light and accurate.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Another Point of View on Visual Speech Recognition | Papers | HyperAI