Question Answering On Next Qa Open Ended
评估指标
Accuracy
Confidence Score
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| Flash-VStream | 61.6 | 3.4 | Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | |
| Vista-LLaMA | 60.7 | 3.4 | Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens | - |
| VideoChat | 56.6 | 3.2 | VideoChat: Chat-Centric Video Understanding | |
| MovieChat+ | 54.8 | 3.0 | MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | |
| Video-ChatGPT | 54.6 | 3.2 | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | |
| MovieChat | 49.9 | 2.7 | MovieChat: From Dense Token to Sparse Memory for Long Video Understanding |
0 of 6 row(s) selected.