5 months ago

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

Jung Chaeyoung ; Lee Suyeon ; Nam Kihyun ; Rho Kyeongha ; Kim You Jin ; Jang Youngjoon ; Chung Joon Son

Abstract

The goal of this work is Active Speaker Detection (ASD), a task to determinewhether a person is speaking or not in a series of video frames. Previous workshave dealt with the task by exploring network architectures while learningeffective representations has been less explored. In this work, we proposeTalkNCE, a novel talk-aware contrastive loss. The loss is only applied to partof the full segments where a person on the screen is actually speaking. Thisencourages the model to learn effective representations through the naturalcorrespondence of speech and facial movements. Our loss can be jointlyoptimized with the existing objectives for training ASD models without the needfor additional supervision or training data. The experiments demonstrate thatour loss can be easily integrated into the existing ASD frameworks, improvingtheir performance. Our method achieves state-of-the-art performances onAVA-ActiveSpeaker and ASW datasets.

Code Repositories

kaistmm/TalkNCE

pytorch

Benchmarks

Benchmark	Methodology	Metrics
audio-visual-active-speaker-detection-on-ava	LoCoNet+TalkNCE	validation mean average precision: 95.5%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette