HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TempCLR: Temporal Alignment Representation with Contrastive Learning

Yang Yuncong ; Ma Jiawei ; Huang Shiyuan ; Chen Long ; Lin Xudong ; Han Guangxing ; Chang Shih-Fu

TempCLR: Temporal Alignment Representation with Contrastive Learning

Abstract

Video representation learning has been successful in video-text pre-trainingfor zero-shot transfer, where each sentence is trained to be close to thepaired video clips in a common feature space. For long videos, given aparagraph of description where the sentences describe different segments of thevideo, by matching all sentence-clip pairs, the paragraph and the full videoare aligned implicitly. However, such unit-level comparison may ignore globaltemporal context, which inevitably limits the generalization ability. In thispaper, we propose a contrastive learning framework TempCLR to compare the fullvideo and the paragraph explicitly. As the video/paragraph is formulated as asequence of clips/sentences, under the constraint of their temporal order, weuse dynamic time warping to compute the minimum cumulative cost oversentence-clip pairs as the sequence-level distance. To explore the temporaldynamics, we break the consistency of temporal succession by shuffling videoclips w.r.t. temporal granularity. Then, we obtain the representations forclips/sentences, which perceive the temporal information and thus facilitatethe sequence alignment. In addition to pre-training on the video and paragraph,our approach can also generalize on the matching between video instances. Weevaluate our approach on video retrieval, action step localization, andfew-shot action recognition, and achieve consistent performance gain over allthree tasks. Detailed ablation studies are provided to justify the approachdesign.

Code Repositories

yyuncong/tempclr
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
long-video-retrieval-background-removed-onTempCLR
Cap. Avg. R@1: 74.5
Cap. Avg. R@10: 97.0
Cap. Avg. R@5: 94.6
DTW R@1: 83.5
DTW R@10: 99.3
DTW R@5: 97.2
OTAM R@1: 84.9
OTAM R@10: 99.5
OTAM R@5: 97.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TempCLR: Temporal Alignment Representation with Contrastive Learning | Papers | HyperAI