5 months ago

Temporal and cross-modal attention for audio-visual zero-shot learning

Mercea Otniel-Bogdan ; Hummel Thomas ; Koepke A. Sophia ; Akata Zeynep

Abstract

Audio-visual generalised zero-shot learning for video classification requiresunderstanding the relations between the audio and visual information in orderto be able to recognise samples from novel, previously unseen classes at testtime. The natural semantic and temporal alignment between audio and visual datain video data can be exploited to learn powerful representations thatgeneralise to unseen classes at test time. We propose a multi-modal andTemporal Cross-attention Framework (\modelName) for audio-visual generalisedzero-shot learning. Its inputs are temporally aligned audio and visual featuresthat are obtained from pre-trained networks. Encouraging the framework to focuson cross-modal correspondence across time instead of self-attention within themodalities boosts the performance significantly. We show that our proposedframework that ingests temporal features yields state-of-the-art performance onthe \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.Code for reproducing all results is available at\url{https://github.com/ExplainableML/TCAF-GZSL}.

Code Repositories

explainableml/avdiff-gfsl

pytorch

Mentioned in GitHub

explainableml/tcaf-gzsl

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
gzsl-video-classification-on-activitynet-gzsl	TCaF	HM: 12.20 ZSL: 7.96
gzsl-video-classification-on-activitynet-gzsl-1	TCaF	HM: 10.71 ZSL: 7.91
gzsl-video-classification-on-ucf-gzsl-cls	TCaF	HM: 50.78 ZSL: 44.64
gzsl-video-classification-on-ucf-gzsl-main	TCaF	HM: 31.72 ZSL: 24.81
gzsl-video-classification-on-vggsound-gzsl	TCaF	HM: 8.77 ZSL: 7.41
gzsl-video-classification-on-vggsound-gzsl-1	TCaF	HM: 7.33 ZSL: 6.06

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette