HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Training Strategies for Improved Lip-reading

Ma Pingchuan ; Wang Yujiang ; Petridis Stavros ; Shen Jie ; Pantic Maja

Training Strategies for Improved Lip-reading

Abstract

Several training strategies and temporal models have been recently proposedfor isolated word lip-reading in a series of independent works. However, thepotential of combining the best strategies and investigating the impact of eachof them has not been explored. In this paper, we systematically investigate theperformance of state-of-the-art data augmentation approaches, temporal modelsand other training strategies, like self-distillation and using word boundaryindicators. Our results show that Time Masking (TM) is the most importantaugmentation followed by mixup and Densely-Connected Temporal ConvolutionalNetworks (DC-TCN) are the best temporal model for lip-reading of isolatedwords. Using self-distillation and word boundary indicators is also beneficialbut to a lesser extent. A combination of all the above methods results in aclassification accuracy of 93.4%, which is an absolute improvement of 4.6% overthe current state-of-the-art performance on the LRW dataset. The performancecan be further improved to 94.1% by pre-training on additional datasets. Anerror analysis of the various training strategies reveals that the performanceimproves by increasing the classification accuracy of hard-to-recognise words.

Benchmarks

BenchmarkMethodologyMetrics
lipreading-on-lip-reading-in-the-wild3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)
Top-1 Accuracy: 94.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Training Strategies for Improved Lip-reading | Papers | HyperAI