5 months ago

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Nag Sauradip ; Zhu Xiatian ; Xiang Tao

Abstract

Existing temporal action localization (TAL) works rely on a large number oftraining videos with exhaustive segment-level annotation, preventing them fromscaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL)aims to adapt a model to a new class represented by as few as a single video.Exiting FS-TAL methods assume trimmed training videos for new classes. However,this setting is not only unnatural actions are typically captured in untrimmedvideos, but also ignores background video segments containing vital contextualcues for foreground action segmentation. In this work, we first propose a newFS-TAL setting by proposing to use untrimmed training videos. Further, a novelFS-TAL model is proposed which maximizes the knowledge transfer from trainingclasses whilst enabling the model to be dynamically adapted to both the newclass and each video of that class simultaneously. This is achieved byintroducing a query adaptive Transformer in the model. Extensive experiments ontwo action localization benchmarks demonstrate that our method can outperformall the state of the art alternatives significantly in both single-domain andcross-domain scenarios. The source code can be found inhttps://github.com/sauradip/fewshotQAT

Code Repositories

sauradip/fewshotQAT

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
few-shot-temporal-action-localization-on	FS-QAT	mIoU: 38.5
few-shot-temporal-action-localization-on-1	FS-QAT	mIoU: 30.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette