Command Palette
Search for a command to run...
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning
Jung Chaeyoung ; Lee Suyeon ; Nam Kihyun ; Rho Kyeongha ; Kim You Jin ; Jang Youngjoon ; Chung Joon Son

Abstract
The goal of this work is Active Speaker Detection (ASD), a task to determinewhether a person is speaking or not in a series of video frames. Previous workshave dealt with the task by exploring network architectures while learningeffective representations has been less explored. In this work, we proposeTalkNCE, a novel talk-aware contrastive loss. The loss is only applied to partof the full segments where a person on the screen is actually speaking. Thisencourages the model to learn effective representations through the naturalcorrespondence of speech and facial movements. Our loss can be jointlyoptimized with the existing objectives for training ASD models without the needfor additional supervision or training data. The experiments demonstrate thatour loss can be easily integrated into the existing ASD frameworks, improvingtheir performance. Our method achieves state-of-the-art performances onAVA-ActiveSpeaker and ASW datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-visual-active-speaker-detection-on-ava | LoCoNet+TalkNCE | validation mean average precision: 95.5% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.