HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Context-Guided Spatio-Temporal Video Grounding

Xin Gu; Heng Fan; Yan Huang; Tiejian Luo; Libo Zhang

Context-Guided Spatio-Temporal Video Grounding

Abstract

Spatio-temporal video grounding (or STVG) task aims at locating a spatio-temporal tube for a specific instance given a text query. Despite advancements, current methods easily suffer the distractors or heavy object appearance variations in videos due to insufficient object information from the text, leading to degradation. Addressing this, we propose a novel framework, context-guided STVG (CG-STVG), which mines discriminative instance context for object in videos and applies it as a supplementary guidance for target localization. The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context. During grounding, ICG, together with ICR, are deployed at each decoding stage of a Transformer architecture for instance context learning. Particularly, instance context learned from one decoding stage is fed to the next stage, and leveraged as a guidance containing rich and discriminative object feature to enhance the target-awareness in decoding feature, which conversely benefits generating better new instance context for improving localization finally. Compared to existing methods, CG-STVG enjoys object information in text query and guidance from mined instance visual context for more accurate target localization. In our experiments on three benchmarks, including HCSTVG-v1/-v2 and VidSTG, CG-STVG sets new state-of-the-arts in m_tIoU and m_vIoU on all of them, showing its efficacy. The code will be released at https://github.com/HengLan/CGSTVG.

Code Repositories

henglan/cgstvg
Official
pytorch
Mentioned in GitHub
shaohuadong2021/dplnet
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
spatio-temporal-video-grounding-on-hc-stvg1CG-STVG
m_vIoU: 38.4
vIoU@0.3: 61.5
vIoU@0.5: 36.3
spatio-temporal-video-grounding-on-hc-stvg2CG-STVG
Val m_vIoU: 39.5
Val vIoU@0.3: 64.5
Val vIoU@0.5: 36.3
spatio-temporal-video-grounding-on-vidstgCG-STVG
Declarative m_vIoU: 34.0
Declarative vIoU@0.3: 47.7
Declarative vIoU@0.5: 33.1
Interrogative m_vIoU: 29.0
Interrogative vIoU@0.3: 40.5
Interrogative vIoU@0.5: 27.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Context-Guided Spatio-Temporal Video Grounding | Papers | HyperAI