HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method

Jin Wenping ; Zhu Li ; Sun Jing

Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal
  Violence Detection Method

Abstract

Weakly supervised violence detection refers to the technique of trainingmodels to identify violent segments in videos using only video-level labels.Among these approaches, multimodal violence detection, which integratesmodalities such as audio and optical flow, holds great potential. Existingmethods in this domain primarily focus on designing multimodal fusion models toaddress modality discrepancies. In contrast, we take a different approach;leveraging the inherent discrepancies across modalities in violence eventrepresentation to propose a novel multimodal semantic feature alignment method.This method sparsely maps the semantic features of local, transient, and lessinformative modalities ( such as audio and optical flow ) into the moreinformative RGB semantic feature space. Through an iterative process, themethod identifies the suitable no-zero feature matching subspace and aligns themodality-specific event representations based on this subspace, enabling thefull exploitation of information from all modalities during the subsequentmodality fusion stage. Building on this, we design a new weakly supervisedviolence detection framework that consists of unimodal multiple-instancelearning for extracting unimodal semantic features, multimodal alignment,multimodal fusion, and final detection. Experimental results on benchmarkdatasets demonstrate the effectiveness of our method, achieving an averageprecision (AP) of 86.07% on the XD-Violence dataset. Our code is available athttps://github.com/xjpp2016/MAVD.

Code Repositories

xjpp2016/mavd
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
anomaly-detection-in-surveillance-videos-on-2MAVD
AP: 86.07

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method | Papers | HyperAI