HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Hierarchical Self-Attention Network for Action Localization in Videos

{ Wen-Hsien Fang Yie-Tarng Chen Rizard Renanda Adhi Pramono}

Hierarchical Self-Attention Network for Action Localization in Videos

Abstract

This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos. The essence of HISAN is to combine the two-stream convolutional neural network (CNN) with hierarchical bidirectional self-attention mechanism, which comprises of two levels of bidirectional self-attention to efficaciously capture both of the long-term temporal dependency information and spatial context information to render more precise action localization. Also, a sequence rescoring (SR) algorithm is employed to resolve the dilemma of inconsistent detection scores incurred by occlusion or background clutter. Moreover, a new fusion scheme is invoked, which integrates not only the appearance and motion information from the two-stream network, but also the motion saliency to mitigate the effect of camera motion. Simulations reveal that the new approach achieves competitive performance as the state-of-the-art works in terms of action localization and recognition accuracy on the widespread UCF101-24 and J-HMDB datasets.

Benchmarks

BenchmarkMethodologyMetrics
action-detection-on-j-hmdbHISAN (VGG-16)
Frame-mAP 0.5: 76.72
Video-mAP 0.2: 85.97
Video-mAP 0.5: 84.02
action-detection-on-j-hmdbHISAN (ResNet-101 + FPN)
Video-mAP 0.2: 87.59
Video-mAP 0.5: 86.49
action-detection-on-ucf101-24HISAN (ResNet-101 + FPN)
Video-mAP 0.2: 82.30
Video-mAP 0.5: 51.47
action-detection-on-ucf101-24HISAN (VGG-16)
Frame-mAP 0.5: 73.71
Video-mAP 0.2: 80.42
Video-mAP 0.5: 49.50

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Self-Attention Network for Action Localization in Videos | Papers | HyperAI