| AdaTAD (VideoMAEv2-giant) | - | - | 89.7 | 86.7 | 80.9 | End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames | |
| RDFA-S6 (InternVideo2-6B) | - | - | 88.7 | 84.6 | 78.2 | Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism | |
| ActionMamba(InternVideo2-6B) | - | - | 86.89 | 83.09 | 76.90 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | |
| TriDet (VideoMAE v2-g feature) | - | - | 84.8 | 80.0 | 73.3 | Temporal Action Localization with Enhanced Instant Discriminability | |
| ActionFormer (VideoMAE V2-g features) | - | - | 84.0 | 79.6 | 73.0 | VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | |
| TriDet (I3D features) | - | - | 83.6 | 80.1 | 72.9 | TriDet: Temporal Action Detection with Relative Boundary Modeling | |
| TemporalMaxer (I3D features) | - | - | 82.8 | 78.9 | 71.8 | TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization | |
| ASL(I3D features) | - | - | 83.1 | 79.0 | 71.7 | Action Sensitivity Learning for Temporal Action Localization | - |
| ActionFormer (I3D features) | - | - | 82.1 | 77.8 | 71.0 | ActionFormer: Localizing Moments of Actions with Transformers | |
| DualDETR (I3D features) | - | - | 82.9 | 78.0 | 70.4 | Dual DETRs for Multi-Label Temporal Action Detection | - |
| BasicTAD (160,6,192,R50-SlowOnly) | - | - | 75.5 | 70.8 | 63.5 | BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection | |
| TadML(two-stream) | - | - | 73.29 | 69.73 | 62.53 | TadML: A fast temporal action detection with Mechanics-MLP | |
| TadTR | - | - | 74.8 | 69.1 | 60.1 | End-to-end Temporal Action Detection with Transformer | |
| BasicTAD (112,3,96,R50-SlowOnly) | - | - | 68.4 | 65.0 | 58.6 | BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection | |
| ReAct (TSN features) | - | - | 69.2 | 65.0 | 57.1 | ReAct: Temporal Action Detection with Relational Queries | |
| AVFusion | - | - | 70.1 | 64.9 | 57.1 | Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization | |
| TAGS (I3D) | - | - | 68.6 | 63.8 | 57.0 | Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning | |
| MUSES | - | - | 68.9 | 64.0 | 56.9 | Multi-shot Temporal Event Localization: a Benchmark | |
| TadML(rgb-only) | - | - | 68.78 | 64.66 | 56.61 | TadML: A fast temporal action detection with Mechanics-MLP | |
| E2E-TAD (SlowFast R50+TadTR) | - | - | 69.4 | 64.3 | 56.0 | An Empirical Study of End-to-End Temporal Action Detection | |