4 months ago

Learning Latent Super-Events to Detect Multiple Activities in Videos

AJ Piergiovanni; Michael S. Ryoo

Abstract

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design temporal structure filters that enable the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts.

Code Repositories

piergiaj/tgm-icml19

pytorch

Mentioned in GitHub

piergiaj/super-events-cvpr18

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
action-detection-on-charades	Super-events (RGB+Flow)	mAP: 19.41
action-detection-on-multi-thumos	I3D + our super-event	mAP: 36.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette