HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learn an Effective Lip Reading Model without Pains

Feng Dalu ; Yang Shuang ; Shan Shiguang ; Chen Xilin

Learn an Effective Lip Reading Model without Pains

Abstract

Lip reading, also known as visual speech recognition, aims to recognize thespeech content from videos by analyzing the lip dynamics. There have beenseveral appealing progress in recent years, benefiting much from the rapidlydeveloped deep learning techniques and the recent large-scale lip-readingdatasets. Most existing methods obtained high performance by constructing acomplex neural network, together with several customized training strategieswhich were always given in a very brief description or even shown only in thesource code. We find that making proper use of these strategies could alwaysbring exciting improvements without changing much of the model. Considering thenon-negligible effects of these strategies and the existing tough status totrain an effective lip reading model, we perform a comprehensive quantitativestudy and comparative analysis, for the first time, to show the effects ofseveral different choices for lip reading. By only introducing some easy-to-getrefinements to the baseline pipeline, we obtain an obvious improvement of theperformance from 83.7% to 88.4% and from 38.2% to 55.7% on two largest publicavailable lip reading datasets, LRW and LRW-1000, respectively. They arecomparable and even surpass the existing state-of-the-art results.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
lipreading-on-lip-reading-in-the-wild3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR
Top-1 Accuracy: 85.5
lipreading-on-lip-reading-in-the-wild3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)
Top-1 Accuracy: 88.4
lipreading-on-lrw-10003D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)
Top-1 Accuracy: 55.7%
lipreading-on-lrw-10003D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR
Top-1 Accuracy: 48.3%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Learn an Effective Lip Reading Model without Pains | Papers | HyperAI