HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos

Hannan Tanveer ; Islam Md Mohaiminul ; Seidl Thomas ; Bertasius Gedas

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos

Abstract

Locating specific moments within long videos (20-120 minutes) presents asignificant challenge, akin to finding a needle in a haystack. Adaptingexisting short video (5-30 seconds) grounding methods to this problem yieldspoor performance. Since most real life videos, such as those on YouTube andAR/VR, are lengthy, addressing this issue is crucial. Existing methodstypically operate in two stages: clip retrieval and grounding. However, thisdisjoint process limits the retrieval module's fine-grained eventunderstanding, crucial for specific moment detection. We propose RGNet whichdeeply integrates clip retrieval and grounding into a single network capable ofprocessing long videos into multiple granular levels, e.g., clips and frames.Its core component is a novel transformer encoder, RG-Encoder, that unifies thetwo stages through shared features and mutual optimization. The encoderincorporates a sparse attention mechanism and an attention loss to model bothgranularity jointly. Moreover, we introduce a contrastive clip samplingtechnique to mimic the long video paradigm closely during training. RGNetsurpasses prior methods, showcasing state-of-the-art performance on long videotemporal grounding (LVTG) datasets MAD and Ego4D.

Code Repositories

tanveer81/revisionllm
pytorch
Mentioned in GitHub
tanveer81/rgnet
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
natural-language-moment-retrieval-on-madRGNet
R@1,IoU=0.1: 12.43
R@1,IoU=0.3: 9.48
R@1,IoU=0.5: 5.61
R@5,IoU=0.1: 25.12
R@5,IoU=0.3: 18.72
R@5,IoU=0.5: 10.86

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos | Papers | HyperAI