HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

Zeng Yingsen ; Zhong Yujie ; Feng Chengjian ; Ma Lin

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

Abstract

Temporal Action Detection (TAD) focuses on detecting pre-defined actions,while Moment Retrieval (MR) aims to identify the events described by open-endednatural language within untrimmed videos. Despite that they focus on differentevents, we observe they have a significant connection. For instance, mostdescriptions in MR involve multiple actions from TAD. In this paper, we aim toinvestigate the potential synergy between TAD and MR. Firstly, we propose aunified architecture, termed Unified Moment Detection (UniMD), for both TAD andMR. It transforms the inputs of the two tasks, namely actions for TAD or eventsfor MR, into a common embedding space, and utilizes two novel query-dependentdecoders to generate a uniform output of classification score and temporalsegments. Secondly, we explore the efficacy of two task fusion learningapproaches, pre-training and co-training, in order to enhance the mutualbenefits between TAD and MR. Extensive experiments demonstrate that theproposed task fusion learning scheme enables the two tasks to help each otherand outperform the separately trained counterparts. Impressively, UniMDachieves state-of-the-art results on three paired datasets Ego4D, Charades-STA,and ActivityNet. Our code is available at https://github.com/yingsen1/UniMD.

Code Repositories

yingsen1/unimd
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-detection-on-charadesUniMD+Sync. (RGB+Flow)
mAP: 26.53
moment-retrieval-on-charades-staUniMD+Sync.
R@1 IoU=0.5: 63.98
R@1 IoU=0.7: 44.46
R@5 IoU=0.5: 91.94
R@5 IoU=0.7: 67.72
natural-language-moment-retrieval-onUniMD+Sync.
R@5,IoU=0.5: 80.54
R@5,IoU=0.7: 57.04
temporal-action-localization-on-activitynetUniMD+Sync.
mAP: 39.83
mAP IOU@0.5: 60.29

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | Papers | HyperAI