Command Palette
Search for a command to run...
Vcgbench Diverse On Videoinstruct
Metrics
Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean
Results
Performance results of various models on this benchmark
| Paper Title | Repository | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| VideoGPT+ | 2.59 | 2.81 | 2.46 | 1.38 | 2.73 | 3.63 | 2.80 | 1.78 | 2.47 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | |
| Chat-UniVi | 2.36 | 2.66 | 2.29 | 1.33 | 2.56 | 3.59 | 2.36 | 1.56 | 2.29 | Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | |
| VTimeLLM | 2.35 | 2.48 | 2.16 | 1.13 | 2.41 | 3.45 | 2.29 | 1.46 | 2.17 | VTimeLLM: Empower LLM to Grasp Video Moments | |
| BT-Adapter | 2.27 | 2.59 | 2.20 | 1.03 | 2.62 | 3.62 | 2.35 | 1.29 | 2.19 | BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning | |
| VideoChat2 | 2.27 | 2.51 | 2.13 | 1.26 | 2.42 | 3.13 | 2.43 | 1.66 | 2.20 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | |
| Video-ChatGPT | 2.06 | 2.46 | 2.07 | 0.89 | 2.42 | 3.60 | 2.25 | 1.39 | 2.08 | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models |
0 of 6 row(s) selected.