HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Active Speakers in Context

Alcazar Juan Leon ; Heilbron Fabian Caba ; Mai Long ; Perazzi Federico ; Lee Joon-Young ; Arbelaez Pablo ; Ghanem Bernard

Active Speakers in Context

Abstract

Current methods for active speak er detection focus on modeling short-termaudiovisual information from a single speaker. Although this strategy can beenough for addressing single-speaker scenarios, it prevents accurate detectionwhen the task is to identify who of many candidate speakers are talking. Thispaper introduces the Active Speaker Context, a novel representation that modelsrelationships between multiple speakers over long time horizons. Our ActiveSpeaker Context is designed to learn pairwise and temporal relations from anstructured ensemble of audio-visual observations. Our experiments show that astructured feature ensemble already benefits the active speaker detectionperformance. Moreover, we find that the proposed Active Speaker Contextimproves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving a mAPof 87.1%. We present ablation studies that verify that this result is a directconsequence of our long-term multi-speaker analysis.

Code Repositories

fuankarion/active-speakers-context
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-visual-active-speaker-detection-on-avaActive Speakers in Context
validation mean average precision: 87.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Active Speakers in Context | Papers | HyperAI