HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection

Zhao Pengcheng ; He Zhixian ; Zhang Fuwei ; Lin Shujin ; Zhou Fan

LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval
  and Highlight Detection

Abstract

Video Moment Retrieval and Highlight Detection aim to find correspondingcontent in the video based on a text query. Existing models usually first usecontrastive learning methods to align video and text features, then fuse andextract multimodal information, and finally use a Transformer Decoder to decodemultimodal information. However, existing methods face several issues: (1)Overlapping semantic information between different samples in the datasethinders the model's multimodal aligning performance; (2) Existing models arenot able to efficiently extract local features of the video; (3) TheTransformer Decoder used by the existing model cannot adequately decodemultimodal features. To address the above issues, we proposed the LD-DETR modelfor Video Moment Retrieval and Highlight Detection tasks. Specifically, wefirst distilled the similarity matrix into the identity matrix to mitigate theimpact of overlapping semantic information. Then, we designed a method thatenables convolutional layers to extract multimodal local features moreefficiently. Finally, we fed the output of the Transformer Decoder back intoitself to adequately decode multimodal information. We evaluated LD-DETR onfour public benchmarks and conducted extensive experiments to demonstrate thesuperiority and effectiveness of our approach. Our model outperforms theState-Of-The-Art models on QVHighlight, Charades-STA and TACoS datasets. Ourcode is available at https://github.com/qingchen239/ld-detr.

Code Repositories

qingchen239/ld-detr
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
moment-retrieval-on-charades-staLD-DETR
R@1 IoU=0.3: 73.92
R@1 IoU=0.5: 62.58
R@1 IoU=0.7: 41.56
mIoU: 53.44
moment-retrieval-on-qvhighlightsLD-DETR
R@1 IoU=0.5: 66.80
R@1 IoU=0.7: 51.04
mAP: 46.41
mAP@0.5: 67.61
mAP@0.75: 46.99
natural-language-moment-retrieval-on-tacosLD-DETR
R@1,IoU=0.3: 57.61
R@1,IoU=0.5: 44.31
R@1,IoU=0.7: 26.24
mIoU: 40.30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection | Papers | HyperAI