| SG-DETR (w/ PT) | 74.20 | 60.40 | 58.80 | 76.20 | 60.80 | Saliency-Guided DETR for Moment Retrieval and Highlight Detection | |
| UniVTG (w/ PT) | 65.43 | 50.06 | 43.63 | 64.06 | 45.02 | UniVTG: Towards Unified Video-Language Temporal Grounding | |
| LA-DETR | 63.94 | 51.10 | 47.93 | 65.65 | 49.44 | Length-Aware DETR for Robust Moment Retrieval | |
| BAM-DETR (w/ PT ASR Captions) | 63.88 | 47.92 | 46.67 | 66.33 | 48.22 | BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal
Sentence Grounding in Videos | |