Zero Shot Video Retrieval On Didemo

评估指标

text-to-video R@1
text-to-video R@10
text-to-video R@5

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
InternVideo2-6B57.984.680.0InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B57.085.180.0InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VAST55.579.674.3VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
GRAM54.280.7-Gramian Multimodal Representation Learning and Alignment
vid-TLDR (UMT-L)52.081.074.0vid-TLDR: Training Free Token merging for Light-weight Video Transformer
UMT-L (ViT-L/16)48.679.072.9Unmasked Teacher: Towards Training-Efficient Video Foundation Models
mPLUG-245.779.271.1mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
HiTeA-17M43.279.069.3HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
LanguageBind(ViT-H/14)39.974.666.1LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LanguageBind(ViT-L/14)39.773.865.5LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Singularity-17M37.169.961.7Revealing Single Frame Bias for Video-and-Language Learning
Singularity-5M36.969.361.1Revealing Single Frame Bias for Video-and-Language Learning
HiTeA-5M36.170.360.1HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
BT-Adapter35.672.661.9BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
OmniVL33.368.558.7OmniVL:One Foundation Model for Image-Language and Video-Language Tasks-
InternVideo31.568.257.6InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Clover29.566.355.2Clover: Towards A Unified Video-Language Alignment and Fusion Model
MILES27.263.650.3--
Y. Ge et. al.25.661.150.6Bridging Video-text Retrieval with Multiple Choice Questions
ALPRO23.857.947.3Align and Prompt: Video-and-Language Pre-training with Entity Prompts
0 of 26 row(s) selected.
Zero Shot Video Retrieval On Didemo | SOTA | HyperAI超神经