Visual Question Answering On Msrvtt Qa 1

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
VLAB0.496VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending-
MaMMUT0.495MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
mPLUG-20.480mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
MuLTI0.478MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling-
Flamingo0.474Flamingo: a Visual Language Model for Few-Shot Learning
UMT-L (ViT-L/16)0.471Unmasked Teacher: Towards Training-Efficient Video Foundation Models
InternVideo0.471InternVideo: General Video Foundation Models via Generative and Discriminative Learning
vid-TLDR (UMT-L)0.470vid-TLDR: Training Free Token merging for Light-weight Video Transformer
FrozenBiLM+0.470Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
VideoCoCa0.463VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
HBI0.462Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
HiTeA0.459HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
EMCL-Net0.458Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Co-Tokenization.457Video Question Answering with Iterative Video-Text Co-Tokenization-
X2-VLM (large)0.455X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
X2-VLM (base)0.45X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
All-in-one-B0.443All in One: Exploring Unified Video-Language Pre-training
Clover0.441Clover: Towards A Unified Video-Language Alignment and Fusion Model
OmniVL0.441OmniVL:One Foundation Model for Image-Language and Video-Language Tasks-
AIO+MIF0.440Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
0 of 34 row(s) selected.
Visual Question Answering On Msrvtt Qa 1 | SOTA | HyperAI超神经