Video Question Answering On Ivqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| Text + Text (no Multimodal Pretext Training) | 40.2 | Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval | |
| FrozenBiLM | 39.6 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | |
| VideoCoCa | 39.0 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - |
| Co-Tokenization | 38.2 | Video Question Answering with Iterative Video-Text Co-Tokenization | - |
| Just Ask (fine-tune) | 35.4 | Just Ask: Learning to Answer Questions from Millions of Narrated Videos | |
| FrozenBiLM (0-shot) | 26.8 | Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | |
| Just Ask (0-shot) | 12.2 | Just Ask: Learning to Answer Questions from Millions of Narrated Videos |
0 of 7 row(s) selected.