HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Saliency-Guided DETR for Moment Retrieval and Highlight Detection

Gordeev Aleksandr ; Dokholyan Vladimir ; Tolstykh Irina ; Kuprashevich Maksim

Saliency-Guided DETR for Moment Retrieval and Highlight Detection

Abstract

Existing approaches for video moment retrieval and highlight detection arenot able to align text and video features efficiently, resulting inunsatisfying performance and limited production usage. To address this, wepropose a novel architecture that utilizes recent foundational video modelsdesigned for such alignment. Combined with the introduced Saliency-Guided CrossAttention mechanism and a hybrid DETR architecture, our approach significantlyenhances performance in both moment retrieval and highlight detection tasks.For even better improvement, we developed InterVid-MR, a large-scale andhigh-quality dataset for pretraining. Using it, our architecture achievesstate-of-the-art results on the QVHighlights, Charades-STA and TACoSbenchmarks. The proposed approach provides an efficient and scalable solutionfor both zero-shot and fine-tuning scenarios in video-language tasks.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
highlight-detection-on-qvhighlightsSG-DETR
Hit@1: 69.13
mAP: 43.76
highlight-detection-on-qvhighlightsSG-DETR (w/ PT)
Hit@1: 71.00
mAP: 44.70
highlight-detection-on-tvsumSG-DETR
mAP: 87.1
highlight-detection-on-youtube-highlightsSG-DETR
mAP: 76.7
highlight-detection-on-youtube-highlightsSG-DETR (w/ PT)
mAP: 78.0
moment-retrieval-on-charades-staSG-DETR (w/ PT)
R@1 IoU=0.5: 71.10
R@1 IoU=0.7: 52.80
moment-retrieval-on-charades-staSG-DETR
R@1 IoU=0.5: 70.20
R@1 IoU=0.7: 49.50
moment-retrieval-on-qvhighlightsSG-DETR
R@1 IoU=0.5: 72.20
R@1 IoU=0.7: 56.60
mAP: 54.10
mAP@0.5: 73.20
mAP@0.75: 55.80
moment-retrieval-on-qvhighlightsSG-DETR (w/ PT)
R@1 IoU=0.5: 74.20
R@1 IoU=0.7: 60.40
mAP: 58.80
mAP@0.5: 76.20
mAP@0.75: 60.80
natural-language-moment-retrieval-on-tacosSG-DETR
R@1,IoU=0.3: 56.71
R@1,IoU=0.5: 44.70
R@1,IoU=0.7: 29.90
mIoU: 40.90
natural-language-moment-retrieval-on-tacosSG-DETR (w/ PT)
R@1,IoU=0.3: 58.10
R@1,IoU=0.5: 46.40
R@1,IoU=0.7: 33.90
mIoU: 42.40
zero-shot-moment-retrieval-on-qvhighlightsSG-DETR (ZS)
R1@0.5: 63.90
R1@0.7: 49.60
mAP: 48.30
mAP@0.5: 67.50
mAP@0.75: 49.00

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Saliency-Guided DETR for Moment Retrieval and Highlight Detection | Papers | HyperAI