Video Captioning On Msr Vtt 1

评估指标

BLEU-4
CIDEr
METEOR
ROUGE-L

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
mPLUG-257.880.034.970.1mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
VAST56.778.0--VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
GIT254.875.933.168.2GIT: A Generative Image-to-text Transformer for Vision and Language
VLAB54.674.933.468.3VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending-
COSA53.774.7--COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
VALOR54.474.032.968.0VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
MaMMUT (ours)-73.6--MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
VideoCoCa53.873.2-68.0VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
RTQ49.669.3-66.1RTQ: Rethinking Video-language Understanding Based on Image-text Model
HowToCaption49.865.332.266.3HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HiTeA49.265.130.765.0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
Vid2Seq-64.630.8-Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
TextKG46.660.830.564.8Text with Knowledge Graph Augmented Transformer for Video Captioning-
IcoCap (ViT-B/16)47.060.231.164.9IcoCap: Improving Video Captioning by Compounding Images-
MV-GPT48.960.038.764.0End-to-end Generative Pretraining for Multimodal Video Captioning-
IcoCap (ViT-B/32)46.159.130.364.3IcoCap: Improving Video Captioning by Compounding Images-
CLIP-DCD48.258.731.364.8CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
VIOLETv2-58--An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
CoCap (ViT/L14)44.457.230.363.4Accurate and Fast Compressed Video Captioning
VASTA (Vatex-backbone)44.2156.0830.2462.9Diverse Video Captioning by Adaptive Spatio-temporal Attention
0 of 24 row(s) selected.
Video Captioning On Msr Vtt 1 | SOTA | HyperAI超神经