HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Jie Lei Liwei Wang Yelong Shen Dong Yu Tamara L. Berg Mohit Bansal

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Abstract

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. The memory module generates a highly summarized memory state from the video segments and the sentence history so as to help better prediction of the next sentence (w.r.t. coreference and repetition aspects), thus encouraging coherent paragraph generation. Extensive experiments, human evaluations, and qualitative analyses on two popular datasets ActivityNet Captions and YouCookII show that MART generates more coherent and less repetitive paragraph captions than baseline methods, while maintaining relevance to the input video events. All code is available open-source at: https://github.com/jayleicn/recurrent-transformer

Code Repositories

jayleicn/recurrent-transformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-captioning-on-activitynet-captionsMART (ae-test split) - Appearance + Flow
BLEU4: 10.33
CIDEr: 23.42
METEOR: 15.68

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning | Papers | HyperAI