HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Kim Minkuk ; Kim Hyeon Bae ; Moon Jinyoung ; Choi Jinwoo ; Kim Seong Tae

Do You Remember? Dense Video Captioning with Cross-Modal Memory
  Retrieval

Abstract

There has been significant attention to the research on dense videocaptioning, which aims to automatically localize and caption all events withinuntrimmed video. Several studies introduce methods by designing dense videocaptioning as a multitasking problem of event localization and event captioningto consider inter-task relations. However, addressing both tasks using onlyvisual input is challenging due to the lack of semantic content. In this study,we address this by proposing a novel framework inspired by the cognitiveinformation processing of humans. Our model utilizes external memory toincorporate prior knowledge. The memory retrieval method is proposed withcross-modal video-to-text matching. To effectively incorporate retrieved textfeatures, the versatile encoder and the decoder with visual and textualcross-attention modules are designed. Comparative experiments have beenconducted to show the effectiveness of the proposed method on ActivityNetCaptions and YouCook2 datasets. Experimental results show promising performanceof our model without extensive pretraining from a large video dataset.

Code Repositories

ailab-kyunghee/cm2_dvc
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
dense-video-captioning-on-activitynetCM²
BLEU4: 2.38
CIDEr: 33.01
F1: 55.21
METEOR: 8.55
Precision: 56.81
Recall: 53.71
SODA: 6.18
dense-video-captioning-on-youcook2CM²
BLEU4: 1.63
CIDEr: 31.66
F1: 28.43
METEOR: 6.08
Precision: 33.38
Recall: 24.76
SODA: 5.34

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Papers | HyperAI