Command Palette
Search for a command to run...
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Min Kyle ; Roy Sourya ; Tripathi Subarna ; Guha Tanaya ; Majumdar Somdeb

Abstract
Active speaker detection (ASD) in videos with multiple speakers is achallenging task as it requires learning effective audiovisual features andspatial-temporal correlations over long temporal windows. In this paper, wepresent SPELL, a novel spatial-temporal graph learning framework that can solvecomplex tasks such as ASD. To this end, each person in a video frame is firstencoded in a unique node for that frame. Nodes corresponding to a single personacross frames are connected to encode their temporal dynamics. Nodes within aframe are also connected to encode inter-person relationships. Thus, SPELLreduces ASD to a node classification task. Importantly, SPELL is able to reasonover long temporal contexts for all nodes without relying on computationallyexpensive fully connected graph neural networks. Through extensive experimentson the AVA-ActiveSpeaker dataset, we demonstrate that learning graph-basedrepresentations can significantly improve the active speaker detectionperformance owing to its explicit spatial and temporal structure. SPELLoutperforms all previous state-of-the-art approaches while requiringsignificantly lower memory and computational resources. Our code is publiclyavailable at https://github.com/SRA2/SPELL
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-visual-active-speaker-detection-on-ava | SPELL | validation mean average precision: 94.2% |
| audio-visual-active-speaker-detection-on-ava | SPELL+ | validation mean average precision: 94.9% |
| node-classification-on-ava | UniCon [zhang2021unicon] | mAP: 92 |
| node-classification-on-ava | MAAS-TAN [MAAS2021] | mAP: 88.8 |
| node-classification-on-ava | ASDNet [ASDNet_ICCV2021] | mAP: 93.5 |
| node-classification-on-ava | TalkNet [tao2021someone] | mAP: 92.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.