HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

{ Dacheng Tao Junchi Yan Cheng Deng Hao Wang}

Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query

Abstract

Actor and action video segmentation from natural language query aims to selectively segment the actor and its action in a video based on an input textual description. Previous works mostly focus on learning simple correlation between two heterogeneous features of vision and language via dynamic convolution or fully convolutional classification. However, they ignore the linguistic variation of natural language query and have difficulty in modeling global visual context, which leads to unsatisfactory segmentation performance. To address these issues, we propose an asymmetric cross-guided attention network for actor and action video segmentation from natural language query. Specifically, we frame an asymmetric cross-guided attention network, which consists of vision guided language attention to reduce the linguistic variation of input query and language guided vision attention to incorporate query-focused global visual context simultaneously. Moreover, we adopt multi-resolution fusion scheme and weighted loss for foreground and background pixels to obtain further performance improvement. Extensive experiments on Actor-Action Dataset Sentences and J-HMDB Sentences show that our proposed approach notably outperforms state-of-the-art methods.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-a2dACGA
AP: 0.274
IoU mean: 0.490
IoU overall: 0.601
Precision@0.5: 0.557
Precision@0.6: 0.459
Precision@0.7: 0.319
Precision@0.8: 0.16
Precision@0.9: 0.02
referring-expression-segmentation-on-j-hmdbACGA
AP: 0.289
IoU mean: 0.584
IoU overall: 0.576
Precision@0.5: 0.756
Precision@0.6: 0.564
Precision@0.7: 0.287
Precision@0.8: 0.034
Precision@0.9: 0.000

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query | Papers | HyperAI