Command Palette
Search for a command to run...
TempCLR: Temporal Alignment Representation with Contrastive Learning
Yang Yuncong ; Ma Jiawei ; Huang Shiyuan ; Chen Long ; Lin Xudong ; Han Guangxing ; Chang Shih-Fu

Abstract
Video representation learning has been successful in video-text pre-trainingfor zero-shot transfer, where each sentence is trained to be close to thepaired video clips in a common feature space. For long videos, given aparagraph of description where the sentences describe different segments of thevideo, by matching all sentence-clip pairs, the paragraph and the full videoare aligned implicitly. However, such unit-level comparison may ignore globaltemporal context, which inevitably limits the generalization ability. In thispaper, we propose a contrastive learning framework TempCLR to compare the fullvideo and the paragraph explicitly. As the video/paragraph is formulated as asequence of clips/sentences, under the constraint of their temporal order, weuse dynamic time warping to compute the minimum cumulative cost oversentence-clip pairs as the sequence-level distance. To explore the temporaldynamics, we break the consistency of temporal succession by shuffling videoclips w.r.t. temporal granularity. Then, we obtain the representations forclips/sentences, which perceive the temporal information and thus facilitatethe sequence alignment. In addition to pre-training on the video and paragraph,our approach can also generalize on the matching between video instances. Weevaluate our approach on video retrieval, action step localization, andfew-shot action recognition, and achieve consistent performance gain over allthree tasks. Detailed ablation studies are provided to justify the approachdesign.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| long-video-retrieval-background-removed-on | TempCLR | Cap. Avg. R@1: 74.5 Cap. Avg. R@10: 97.0 Cap. Avg. R@5: 94.6 DTW R@1: 83.5 DTW R@10: 99.3 DTW R@5: 97.2 OTAM R@1: 84.9 OTAM R@10: 99.5 OTAM R@5: 97.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.