HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Jing Tan Xiaotong Zhao Xintian Shi Bin Kang Limin Wang

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Abstract

Traditional temporal action detection (TAD) usually handles untrimmed videos with small number of action instances from a single label (e.g., ActivityNet, THUMOS). However, this setting might be unrealistic as different classes of actions often co-occur in practice. In this paper, we focus on the task of multi-label temporal action detection that aims to localize all action instances from a multi-label untrimmed video. Multi-label TAD is more challenging as it requires for fine-grained class discrimination within a single video and precise localization of the co-occurring instances. To mitigate this issue, we extend the sparse query-based detection paradigm from the traditional TAD and propose the multi-label TAD framework of PointTAD. Specifically, our PointTAD introduces a small set of learnable query points to represent the important frames of each action instance. This point-based representation provides a flexible mechanism to localize the discriminative frames at boundaries and as well the important frames inside the action. Moreover, we perform the action decoding process with the Multi-level Interactive Module to capture both point-level and instance-level action semantics. Finally, our PointTAD employs an end-to-end trainable framework simply based on RGB input for easy deployment. We evaluate our proposed method on two popular benchmarks and introduce the new metric of detection-mAP for multi-label TAD. Our model outperforms all previous methods by a large margin under the detection-mAP metric, and also achieves promising results under the segmentation-mAP metric. Code is available at https://github.com/MCG-NJU/PointTAD.

Code Repositories

mcg-nju/pointtad
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
temporal-action-localization-on-multithumos-1PointTAD
Average mAP: 23.5
mAP IOU@0.1: 42.3
mAP IOU@0.2: 39.7
mAP IOU@0.3: 35.8
mAP IOU@0.4: 30.9
mAP IOU@0.5: 24.9
mAP IOU@0.6: 18.5
mAP IOU@0.7: 12.0
mAP IOU@0.8: 5.6
mAP IOU@0.9: 1.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points | Papers | HyperAI