Moment Retrieval On Charades Sta

评估指标

R@1 IoU=0.5
R@1 IoU=0.7

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
SG-DETR (w/ PT)71.1052.80Saliency-Guided DETR for Moment Retrieval and Highlight Detection
LLaVA-MR70.6549.58LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
FlashVTG70.3249.87FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
SG-DETR70.2049.50Saliency-Guided DETR for Moment Retrieval and Highlight Detection
InternVideo2-6B70.0348.95InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B68.3645.03InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoChat-T (FT)67.143.0TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
UniMD+Sync.63.9844.46UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
LD-DETR62.5841.56LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
VideoLights-B-pt61.9641.05VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
UnLoc-L60.838.4UnLoc: A Unified Framework for Video Localization Tasks
BAM-DETR59.9539.38BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
BM-DETR59.4838.33Background-aware Moment Detection for Video Moment Retrieval
UVCOM59.2536.64Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
CG-DETR58.4436.34Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
LLMEPET58.3136.49Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
UnLoc-B58.135.4UnLoc: A Unified Framework for Video Localization Tasks
QD-DETR (Only Video)57.3132.55Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
video-mamba-suite57.1836.05Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Moment-DETR w/ PT (on 10K HowTo100M videos)55.6534.17QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
0 of 25 row(s) selected.
Moment Retrieval On Charades Sta | SOTA | HyperAI超神经