Natural Language Moment Retrieval On Mad
评估指标
R@1,IoU=0.1
R@1,IoU=0.3
R@1,IoU=0.5
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||||
|---|---|---|---|---|---|
| ReVisionLLM | 17.3 | 12.7 | 6.7 | ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos | |
| RGNet | 12.43 | 9.48 | 5.61 | RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos | |
| Zero-Shot CLIP + Guidance Model | 9.3 | 4.65 | 2.16 | Localizing Moments in Long Video Via Multimodal Guidance | |
| CLIP | 6.57 | 3.13 | 1.39 | MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions | |
| VLG-Net + Guidance Model | 5.60 | 4.28 | 2.48 | Localizing Moments in Long Video Via Multimodal Guidance | |
| VLG-Net | 3.50 | 2.63 | 1.61 | MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions | |
| Random Chance | 0.09 | 0.04 | 0.01 | MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions |
0 of 7 row(s) selected.