HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

UniCon: Unified Context Network for Robust Active Speaker Detection

Zhang Yuanhang ; Liang Susan ; Yang Shuang ; Liu Xiao ; Wu Zhongqin ; Shan Shiguang ; Chen Xilin

UniCon: Unified Context Network for Robust Active Speaker Detection

Abstract

We introduce a new efficient framework, the Unified Context Network (UniCon),for robust active speaker detection (ASD). Traditional methods for ASD usuallyoperate on each candidate's pre-cropped face track separately and do notsufficiently consider the relationships among the candidates. This potentiallylimits performance, especially in challenging scenarios with low-resolutionfaces, multiple candidates, etc. Our solution is a novel, unified frameworkthat focuses on jointly modeling multiple types of contextual information:spatial context to indicate the position and scale of each candidate's face,relational context to capture the visual relationships among the candidates andcontrast audio-visual affinities with each other, and temporal context toaggregate long-term information and smooth out local uncertainties. Based onsuch information, our model optimizes all candidates in a unified process forrobust and reliable ASD. A thorough ablation study is performed on severalchallenging ASD benchmarks under different settings. In particular, our methodoutperforms the state-of-the-art by a large margin of about 15% mean AveragePrecision (mAP) absolute on two challenging subsets: one with three candidatespeakers, and the other with faces smaller than 64 pixels. Together, our UniConachieves 92.0% mAP on the AVA-ActiveSpeaker validation set, surpassing 90% forthe first time on this challenging dataset at the time of submission. Projectwebsite: https://unicon-asd.github.io/.

Benchmarks

BenchmarkMethodologyMetrics
audio-visual-active-speaker-detection-on-avaUniCon
validation mean average precision: 92.0%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
UniCon: Unified Context Network for Robust Active Speaker Detection | Papers | HyperAI