HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

End-to-End Active Speaker Detection

Alcazar Juan Leon ; Cordes Moritz ; Zhao Chen ; Ghanem Bernard

End-to-End Active Speaker Detection

Abstract

Recent advances in the Active Speaker Detection (ASD) problem build upon atwo-stage process: feature extraction and spatio-temporal context aggregation.In this paper, we propose an end-to-end ASD workflow where feature learning andcontextual predictions are jointly learned. Our end-to-end trainable networksimultaneously learns multi-modal embeddings and aggregates spatio-temporalcontext. This results in more suitable feature representations and improvedperformance in the ASD task. We also introduce interleaved graph neural network(iGNN) blocks, which split the message passing according to the main sources ofcontext in the ASD problem. Experiments show that the aggregated features fromthe iGNN blocks are more suitable for ASD, resulting in state-of-the artperformance. Finally, we design a weakly-supervised strategy, whichdemonstrates that the ASD problem can also be approached by utilizingaudiovisual data but relying exclusively on audio annotations. We achieve thisby modelling the direct relationship between the audio signal and the possiblesound sources (speakers), as well as introducing a contrastive loss. All theresources of this project will be made available at:https://github.com/fuankarion/end-to-end-asd.

Code Repositories

tiago-roxo/bias
pytorch
Mentioned in GitHub
fuankarion/end-to-end-asd
Official
pytorch
Mentioned in GitHub
tiago-roxo/asdnb
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-visual-active-speaker-detection-on-avaEASEE-50
validation mean average precision: 94.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
End-to-End Active Speaker Detection | Papers | HyperAI