Command Palette
Search for a command to run...
Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
Sangyoun Lee Juho Jung Changdae Oh Sunghee Yun

Abstract
Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates the Feature Aggregated Bi-S6 block, Dual Bi-S6 structure, and a recurrent mechanism to enhance temporal and channel-wise dependency modeling without increasing parameter complexity. Extensive experiments on benchmark datasets demonstrate state-of-the-art results with mAP scores of 74.2% on THUMOS-14, 42.9% on ActivityNet, 29.6% on FineAction, and 45.8% on HACS. Ablation studies validate our method's effectiveness, showing that the Dual structure in the Stem module and the recurrent mechanism outperform traditional approaches. Our findings demonstrate the potential of S6-based models in TAL tasks, paving the way for future research.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| temporal-action-localization-on-activitynet | RDFA-S6 (InternVideo2-6B) | mAP: 42.9 mAP IOU@0.5: 64.1 mAP IOU@0.75: 44.0 mAP IOU@0.95: 10.6 |
| temporal-action-localization-on-fineaction | RDFA-S6 (InternVideo2-6B) | mAP: 29.6 mAP IOU@0.5: 46.4 mAP IOU@0.75: 29.5 mAP IOU@0.95: 7.6 |
| temporal-action-localization-on-hacs | RDFA-S6 (InternVideo2-6B) | Average-mAP: 45.8 mAP@0.5: 66.4 mAP@0.75: 47.2 mAP@0.95: 14.3 |
| temporal-action-localization-on-thumos14 | RDFA-S6 (InternVideo2-6B) | Avg mAP (0.3:0.7): 74.2 mAP IOU@0.3: 88.7 mAP IOU@0.4: 84.6 mAP IOU@0.5: 78.2 mAP IOU@0.6: 66.6 mAP IOU@0.7: 51.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.