HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

Hu Fan ; Chen Aozhu ; Wang Ziyue ; Zhou Fangming ; Dong Jianfeng ; Li Xirong

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video
  Retrieval

Abstract

In this paper we revisit feature fusion, an old-fashioned topic, in the newcontext of text-to-video retrieval. Different from previous research thatconsiders feature fusion only at one end, let it be video or text, we aim forfeature fusion for both ends within a unified framework. We hypothesize thatoptimizing the convex combination of the features is preferred to modelingtheir correlations by computationally heavy multi-head self attention. Wepropose Lightweight Attentional Feature Fusion (LAFF). LAFF performs featurefusion at both early and late stages and at both video and text ends, making ita powerful method for exploiting diverse (off-the-shelf) features. Theinterpretability of LAFF can be used for feature selection. Extensiveexperiments on five public benchmark sets (MSR-VTT, MSVD, TGIF, VATEX andTRECVID AVS 2016-2020) justify LAFF as a new baseline for text-to-videoretrieval.

Code Repositories

ruc-aimc-lab/laff
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
ad-hoc-video-search-on-trecvid-avs16-iacc-3LAFF
infAP: 0.222
ad-hoc-video-search-on-trecvid-avs17-iacc-3LAFF
infAP: 0.290
ad-hoc-video-search-on-trecvid-avs18-iacc-3LAFF
infAP: 0.147
ad-hoc-video-search-on-trecvid-avs19-v3c1LAFF
infAP: 0.192
ad-hoc-video-search-on-trecvid-avs20-v3c1LAFF
infAP: 0.265
video-retrieval-on-msr-vttLAFF
text-to-video R@1: 29.1
text-to-video R@10: 65.8
text-to-video R@5: 54.9
video-retrieval-on-msr-vtt-1kaLAFF
text-to-video R@1: 45.8
text-to-video R@10: 82
text-to-video R@5: 71.5
video-retrieval-on-msvdLAFF
text-to-video R@1: 45.4
text-to-video R@10: 84.6
text-to-video R@5: 76.0
video-retrieval-on-tgifLAFF
text-to-video R@1: 24.5
text-to-video R@10: 54.5
text-to-video R@5: 45.0
video-retrieval-on-vatexLAFF
text-to-video R@1: 59.1
text-to-video R@10: 91.7
text-to-video R@50: 96.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval | Papers | HyperAI