HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding

Cao Zhuo ; Zhang Bingqing ; Du Heming ; Yu Xin ; Li Xue ; Wang Sen

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video
  Temporal Grounding

Abstract

Text-guided Video Temporal Grounding (VTG) aims to localize relevant segmentsin untrimmed videos based on textual descriptions, encompassing two subtasks:Moment Retrieval (MR) and Highlight Detection (HD). Although previous typicalmethods have achieved commendable results, it is still challenging to retrieveshort video moments. This is primarily due to the reliance on sparse andlimited decoder queries, which significantly constrain the accuracy ofpredictions. Furthermore, suboptimal outcomes often arise because previousmethods rank predictions based on isolated predictions, neglecting the broadervideo context. To tackle these issues, we introduce FlashVTG, a frameworkfeaturing a Temporal Feature Layering (TFL) module and an Adaptive ScoreRefinement (ASR) module. The TFL module replaces the traditional decoderstructure to capture nuanced video content variations across multiple temporalscales, while the ASR module improves prediction ranking by integrating contextfrom adjacent moments and multi-temporal-scale features. Extensive experimentsdemonstrate that FlashVTG achieves state-of-the-art performance on four widelyadopted datasets in both MR and HD. Specifically, on the QVHighlights dataset,it boosts mAP by 5.8% for MR and 3.3% for HD. For short-moment retrieval,FlashVTG increases mAP to 125% of previous SOTA performance. All theseimprovements are made without adding training burdens, underscoring itseffectiveness. Our code is available at https://github.com/Zhuo-Cao/FlashVTG.

Code Repositories

zhuo-cao/flashvtg
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
highlight-detection-on-qvhighlightsFlashVTG
Hit@1: 71.01
mAP: 44.09
highlight-detection-on-tvsumFlashVTG
mAP: 88
highlight-detection-on-youtube-highlightsFlashVTG
mAP: 75.4
moment-retrieval-on-charades-staFlashVTG
R@1 IoU=0.5: 70.32
R@1 IoU=0.7: 49.87
moment-retrieval-on-qvhighlightsFlashVTG
R@1 IoU=0.5: 70.69
R@1 IoU=0.7: 53.96
mAP: 52.00
mAP@0.5: 72.33
mAP@0.75: 53.85
natural-language-moment-retrieval-on-tacosFlashVTG
R@1,IoU=0.3: 53.71
R@1,IoU=0.5: 41.76
R@1,IoU=0.7: 24.74
mIoU: 37.61

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding | Papers | HyperAI