| AdaFocus (newly extracted I3D-features, LT-Context model) | 78.0 | 76.2 | 78.3 | 82.1 | 79.0 | 67.5 | Towards Weakly Supervised End-to-end Learning for Long-video Action
Recognition | - |
| ASQuery | 77.9 | 74.6 | 78.4 | 80.7 | 76.5 | 66.5 | ASQuery: A Query-based Model for Action Segmentation | - |
| DiffAct | 76.4 | 73.6 | 78.4 | 80.3 | 75.9 | 64.6 | Diffusion Action Segmentation | |
| FACT (efficient hybrid of convolution and transformer model) | 76.2 | 74.7 | 79.7 | 81.4 | 76.5 | 66.2 | FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation | - |
| ASFormer | 73.5 | 68.0 | 75.0 | 76.0 | 70.6 | 57.4 | ASFormer: Transformer for Action Segmentation | |