HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

LipNet: End-to-End Sentence-level Lipreading

Yannis M. Assael; Brendan Shillingford; Shimon Whiteson; Nando de Freitas

LipNet: End-to-End Sentence-level Lipreading

Abstract

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).

Code Repositories

pjenpoomjai/LipNet
tf
Mentioned in GitHub
speech-separation-hse/video-features
pytorch
Mentioned in GitHub
sailordiary/LipNet-PyTorch
pytorch
Mentioned in GitHub
ski-net/lipnet
mxnet
Mentioned in GitHub
Abishalini/LipReadingGUI
Mentioned in GitHub
Fengdalu/LipNet-PyTorch
pytorch
Mentioned in GitHub
hero9968/lipnet-python
tf
Mentioned in GitHub
ms8909/LipONet
tf
Mentioned in GitHub
rizkiarm/LipNet
Official
tf
Mentioned in GitHub
LiZhenghua0311/lip
tf
Mentioned in GitHub
PlatDrake2875/LipNet
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
lipreading-on-grid-corpus-mixed-speechLipNet
Word Error Rate (WER): 4.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp