Command Palette
Search for a command to run...
Jiri Fajtl; Hajar Sadeghi Sokeh; Vasileios Argyriou; Dorothy Monekosso; Paolo Remagnino

Abstract
In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-summarization-on-summe | VASNet | F1-score (Augmented): 51.09 F1-score (Canonical): 49.71 |
| video-summarization-on-tvsum | VASNet | F1-score (Augmented): 62.37 F1-score (Canonical): 61.42 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.