HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Jiang Yiyang ; Zhang Wengyu ; Zhang Xulu ; Wei Xiaoyong ; Chen Chang Wen ; Li Qing

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation
  for Video Moment Retrieval

Abstract

In this paper, we investigate the feasibility of leveraging large languagemodels (LLMs) for integrating general knowledge and incorporating pseudo-eventsas priors for temporal content distribution in video moment retrieval (VMR)models. The motivation behind this study arises from the limitations of usingLLMs as decoders for generating discrete textual descriptions, which hinderstheir direct application to continuous outputs like salience scores andinter-frame embeddings that capture inter-frame relations. To overcome theselimitations, we propose utilizing LLM encoders instead of decoders. Through afeasibility study, we demonstrate that LLM encoders effectively refineinter-concept relations in multimodal embeddings, even without being trained ontextual embeddings. We also show that the refinement capability of LLM encoderscan be transferred to other embeddings, such as BLIP and T5, as long as theseembeddings exhibit similar inter-concept similarity patterns to CLIPembeddings. We present a general framework for integrating LLM encoders intoexisting VMR architectures, specifically within the fusion module. Throughexperimental validation, we demonstrate the effectiveness of our proposedmethods by achieving state-of-the-art performance in VMR. The source code canbe accessed at https://github.com/fletcherjiang/LLMEPET.

Code Repositories

fletcherjiang/llmepet
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
highlight-detection-on-qvhighlightsLLMEPET
Hit@1: 65.69
mAP: 40.33
highlight-detection-on-youtube-highlightsLLMEPET
mAP: 75.3
moment-retrieval-on-charades-staLLMEPET
R@1 IoU=0.5: 58.31
R@1 IoU=0.7: 36.49
moment-retrieval-on-qvhighlightsLLMEPET
R@1 IoU=0.5: 66.73
R@1 IoU=0.7: 49.94
mAP: 44.05
mAP@0.5: 65.76
mAP@0.75: 43.91
natural-language-moment-retrieval-on-tacosLLMEPET
R@1,IoU=0.3: 52.73
R@1,IoU=0.5: 40.12
R@1,IoU=0.7: 22.78
mIoU: 36.55

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Papers | HyperAI