HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Prajwal K R ; Mukhopadhyay Rudrabha ; Namboodiri Vinay ; Jawahar C V

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The
  Wild

Abstract

In this work, we investigate the problem of lip-syncing a talking face videoof an arbitrary identity to match a target speech segment. Current works excelat producing accurate lip movements on a static image or videos of specificpeople seen during the training phase. However, they fail to accurately morphthe lip movements of arbitrary identities in dynamic, unconstrained talkingface videos, resulting in significant parts of the video being out-of-sync withthe new audio. We identify key reasons pertaining to this and hence resolvethem by learning from a powerful lip-sync discriminator. Next, we propose new,rigorous evaluation benchmarks and metrics to accurately measure lipsynchronization in unconstrained videos. Extensive quantitative evaluations onour challenging benchmarks show that the lip-sync accuracy of the videosgenerated by our Wav2Lip model is almost as good as real synced videos. Weprovide a demo video clearly showing the substantial impact of our Wav2Lipmodel and evaluation benchmarks on our website:\url{cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild}.The code and models are released at this GitHub repository:\url{github.com/Rudrabha/Wav2Lip}. You can also try out the interactive demo atthis link: \url{bhaasha.iiit.ac.in/lipsync}.

Code Repositories

PrashanthaTP/wav2mov
pytorch
Mentioned in GitHub
Rudrabha/Wav2Lip
Official
pytorch
Mentioned in GitHub
rockstar-0000/lip_sync_test
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
lip-sync-on-lrs2Wav2Lip + GAN
FID: 4.446
LSE-D: 6.469
lip-sync-on-lrs2Wav2Lip
FID: 4.887
LSE-C: 7.781
LSE-D: 6.386
lip-sync-on-lrs3Wav2Lip + GAN
FID: 4.35
LSE-C: 7.574
LSE-D: 6.986
lip-sync-on-lrs3Wav2Lip
FID: 4.844
LSE-C: 7.887
LSE-D: 6.652
lip-sync-on-lrwWav2Lip
FID: 3.189
LSE-C: 7.49
LSE-D: 6.512
lip-sync-on-lrwWav2Lip + GAN
FID: 2.475
LSE-C: 7.263
LSE-D: 6.774

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild | Papers | HyperAI