5 months ago

Training Strategies for Improved Lip-reading

Ma Pingchuan ; Wang Yujiang ; Petridis Stavros ; Shen Jie ; Pantic Maja

Abstract

Several training strategies and temporal models have been recently proposedfor isolated word lip-reading in a series of independent works. However, thepotential of combining the best strategies and investigating the impact of eachof them has not been explored. In this paper, we systematically investigate theperformance of state-of-the-art data augmentation approaches, temporal modelsand other training strategies, like self-distillation and using word boundaryindicators. Our results show that Time Masking (TM) is the most importantaugmentation followed by mixup and Densely-Connected Temporal ConvolutionalNetworks (DC-TCN) are the best temporal model for lip-reading of isolatedwords. Using self-distillation and word boundary indicators is also beneficialbut to a lesser extent. A combination of all the above methods results in aclassification accuracy of 93.4%, which is an absolute improvement of 4.6% overthe current state-of-the-art performance on the LRW dataset. The performancecan be further improved to 94.1% by pre-training on additional datasets. Anerror analysis of the various training strategies reveals that the performanceimproves by increasing the classification accuracy of hard-to-recognise words.

Code Repositories

mpc001/Lipreading_using_Temporal_Convolutional_Networks

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
lipreading-on-lip-reading-in-the-wild	3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)	Top-1 Accuracy: 94.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette