Command Palette
Search for a command to run...
{Shenghai Rong Zilei Wang Qinying Liu}
Abstract
Temporal action proposal (TAP) aims to generate accurate candidates of action instances in an untrimmed video. It has been proved that contexts are critically important to this task. In this paper, we propose a novel hierarchical context network (HCN) to further explore the snippet-level and proposal-level contexts, which are used to improve the representations of snippets and proposals, respectively. First, we pinpoint that different scales of snippet-level contexts are not equally important for different action instances. To this end, we incorporate a novel gating mechanism into the U-Net structure to capture the content-adaptive snippet-level contexts. Second, to exploit the proposal-level contexts, we propose a task-specific self-attention model with high efficiency. By stacking multiple attention models, we can deeply explore the proposal-level contexts in a wide range. Finally, to leverage both levels of context, we equip HCN with three branches to evaluate proposals from local to global perspectives. Our experiments on the ActivityNet-1.3 and THUMOS14 datasets show that HCN significantly outperforms previous TAP methods. Additionally, further experiments demonstrate that our method can substantially improve the state-of-the-art action detection performance when combined with existing action classifiers.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| temporal-action-localization-on-activitynet | HCN(I3D features) | mAP: 35.61 mAP IOU@0.5: 52.51 mAP IOU@0.75: 36.10 mAP IOU@0.95: 7.12 |
| temporal-action-proposal-generation-on | HCN | AR@100: 77.13 AUC (val): 68.78 |
| temporal-action-proposal-generation-on-thumos | HCH | AR@100: 50.86 AR@1000: 67.34 AR@200: 57.56 AR@50: 64.28 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.