HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Motion-driven Visual Tempo Learning for Video-based Action Recognition

Yuanzhong Liu Junsong Yuan Zhigang Tu

Motion-driven Visual Tempo Learning for Video-based Action Recognition

Abstract

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance. Previous methods capture the visual tempo either by sampling raw videos with multiple rates, which require a costly multi-layer network to handle each rate, or by hierarchically sampling backbone features, which rely heavily on high-level features that miss fine-grained temporal dynamics. In this work, we propose a Temporal Correlation Module (TCM), which can be easily embedded into the current action recognition backbones in a plug-in-and-play manner, to extract action visual tempo from low-level backbone features at single-layer remarkably. Specifically, our TCM contains two main components: a Multi-scale Temporal Dynamics Module (MTDM) and a Temporal Attention Module (TAM). MTDM applies a correlation operation to learn pixel-wise fine-grained temporal dynamics for both fast-tempo and slow-tempo. TAM adaptively emphasizes expressive features and suppresses inessential ones via analyzing the global information across various tempos. Extensive experiments conducted on several action recognition benchmarks, e.g. Something-Something V1 $\&$ V2, Kinetics-400, UCF-101, and HMDB-51, have demonstrated that the proposed TCM is effective to promote the performance of the existing video-based action recognition models for a large margin. The source code is publicly released at https://github.com/yzfly/TCM.

Code Repositories

yzfly/tcm
Official
pytorch
Mentioned in GitHub
zphyix/tcm
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-in-videos-on-somethingTCM (Ensemble)
Top-1 Accuracy: 67.8
action-recognition-in-videos-on-something-1TCM (Ensemble)
Top 1 Accuracy: 57.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Motion-driven Visual Tempo Learning for Video-based Action Recognition | Papers | HyperAI