HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Dual-path Adaptation from Image to Video Transformers

Park Jungin ; Lee Jiyoung ; Sohn Kwanghoon

Dual-path Adaptation from Image to Video Transformers

Abstract

In this paper, we efficiently transfer the surpassing representation power ofthe vision foundation models, such as ViT and Swin, for video understandingwith only a few trainable parameters. Previous adaptation methods havesimultaneously considered spatial and temporal modeling with a unifiedlearnable module but still suffered from fully leveraging the representativecapabilities of image transformers. We argue that the popular dual-path(two-stream) architecture in video models can mitigate this problem. We proposea novel DualPath adaptation separated into spatial and temporal adaptationpaths, where a lightweight bottleneck adapter is employed in each transformerblock. Especially for temporal dynamic modeling, we incorporate consecutiveframes into a grid-like frameset to precisely imitate vision transformers'capability that extrapolates relationships between tokens. In addition, weextensively investigate the multiple baselines from a unified perspective invideo understanding and compare them with DualPath. Experimental results onfour action recognition benchmarks prove that pretrained image transformerswith DualPath can be effectively generalized beyond the data domain.

Code Repositories

park-jungin/dualpath
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-diving-48DualPath w/ ViT-B/16
Acc@1: 88.7
action-classification-on-hmdb51DualPath w/ ViT-B/16 MLPs.
Acc@1: 75.6
action-classification-on-kinetics-400DualPath w/ ViT-L/14
Acc@1: 87.7
Acc@5: 97.8
action-classification-on-kinetics-400DualPath w/ ViT-B/16
Acc@1: 85.4
Acc@5: 97.1
action-recognition-on-diving-48DUALPATH
Accuracy: 88.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Dual-path Adaptation from Image to Video Transformers | Papers | HyperAI