HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

{Yadong Mu Hao Jiang}

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

Abstract

Video summarization has recently engaged increasing attention in computer vision communities. However, the scarcity of annotated data has been a key obstacle in this task. To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i.e., video moment localization) equipped with abundant training data. Our main insight is that the annotated video moments also indicate the semantic highlights of a video, essentially similar to video summary. Approximately, the video summary can be treated as a sparse, redundancy-free version of the video moments. Inspired by this observation, we propose an importance Propagation based collaborative Teaching Network (iPTNet). It consists of two separate modules that conduct video summarization and moment localization, respectively. Each module estimates a frame-wise importance map for indicating keyframes or moments. To perform cross-task sample transfer, we devise an importance propagation module that realizes the conversion between summarization-guided and localization-guided importance maps. This way critically enables optimizing one of the tasks using the data from the other task. Additionally, in order to avoid error amplification caused by batch-wise joint training, we devise a collaborative teaching scheme, which adopts a cross-task mean teaching strategy to realize the joint optimization of the two tasks and provide robust frame-level teaching signals. Extensive experiments on video summarization benchmarks demonstrate that iPTNet significantly outperforms previous state-of-the-art video summarization methods, serving as an effective solution that overcomes the data scarcity issue in video summarization.

Benchmarks

BenchmarkMethodologyMetrics
supervised-video-summarization-on-summeiPTNet
F1-score (Augmented): 56.9
F1-score (Canonical): 54.5
Kendall's Tau: 0.101
Spearman's Rho: 0.119
supervised-video-summarization-on-tvsumiPTNet
F1-score (Augmented): 64.2
F1-score (Canonical): 63.4
Kendall's Tau: 0.134
Spearman's Rho: 0.163

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer | Papers | HyperAI