Zero Shot Video Retrieval On Vatex
评估指标
text-to-video R@1
text-to-video R@10
video-to-text R@1
video-to-text R@10
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||
|---|---|---|---|---|---|---|
| GRAM | 83.9 | 99.5 | 82.7 | 99 | Gramian Multimodal Representation Learning and Alignment | |
| InternVideo2-6B | 71.5 | 97.1 | 85.3 | 99.3 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | |
| InternVideo2-1B | 70.4 | 96.9 | 85.4 | 99.1 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | |
| VideoCoCa | 53.2 | 90.1 | 73.6 | 97.2 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - | 
| InternVideo | 49.5 | - | 69.5 | - | InternVideo: General Video Foundation Models via Generative and Discriminative Learning | 
0 of 5 row(s) selected.