HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries

{Yi Yang Fan Ma Cheng Deng Hao Wang}

Abstract

Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video. This process requires comprehensive language reasoning and fine-grained video understanding. Previous methods mainly leverage dynamic convolutional networks to match visual and semantic representations. However, the dynamic convolution neglects spatial context when processing each region in the frame and is thus challenging to segment similar objects in the complex scenarios. To address such limitation, we construct a context modulated dynamic convolutional network. Specifically, we propose a context modulated dynamic convolutional operation in the proposed framework. The kernels for the specific region are generated from both language sentences and surrounding context features. Moreover, we devise a temporal encoder to incorporate motions into the visual features to further match the query descriptions. Extensive experiments on two benchmark datasets, Actor-Action Dataset Sentences (A2D Sentences) and J-HMDB Sentences, demonstrate that our proposed approach notably outperforms state-of-the-art methods.

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-a2dCMDy
AP: 0.333
IoU mean: 0.531
IoU overall: 0.623
Precision@0.5: 0.607
Precision@0.6: 0.525
Precision@0.7: 0.405
Precision@0.8: 0.235
Precision@0.9: 0.045
referring-expression-segmentation-on-j-hmdbCMDy
AP: 0.301
IoU mean: 0.576
IoU overall: 0.554
Precision@0.5: 0.742
Precision@0.6: 0.587
Precision@0.7: 0.316
Precision@0.8: 0.047
Precision@0.9: 0.000

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries | Papers | HyperAI