Command Palette
Search for a command to run...
Nag Sauradip ; Zhu Xiatian ; Xiang Tao

Abstract
Existing temporal action localization (TAL) works rely on a large number oftraining videos with exhaustive segment-level annotation, preventing them fromscaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL)aims to adapt a model to a new class represented by as few as a single video.Exiting FS-TAL methods assume trimmed training videos for new classes. However,this setting is not only unnatural actions are typically captured in untrimmedvideos, but also ignores background video segments containing vital contextualcues for foreground action segmentation. In this work, we first propose a newFS-TAL setting by proposing to use untrimmed training videos. Further, a novelFS-TAL model is proposed which maximizes the knowledge transfer from trainingclasses whilst enabling the model to be dynamically adapted to both the newclass and each video of that class simultaneously. This is achieved byintroducing a query adaptive Transformer in the model. Extensive experiments ontwo action localization benchmarks demonstrate that our method can outperformall the state of the art alternatives significantly in both single-domain andcross-domain scenarios. The source code can be found inhttps://github.com/sauradip/fewshotQAT
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| few-shot-temporal-action-localization-on | FS-QAT | mIoU: 38.5 |
| few-shot-temporal-action-localization-on-1 | FS-QAT | mIoU: 30.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.