HyperAIHyperAI

Command Palette

Search for a command to run...

Vcgbench Diverse On Videoinstruct

Metrics

Consistency
Contextual Understanding
Correctness of Information
Dense Captioning
Detail Orientation
Reasoning
Spatial Understanding
Temporal Understanding
mean

Results

Performance results of various models on this benchmark

Paper TitleRepository
VideoGPT+2.592.812.461.382.733.632.801.782.47VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Chat-UniVi2.362.662.291.332.563.592.361.562.29Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
VTimeLLM2.352.482.161.132.413.452.291.462.17VTimeLLM: Empower LLM to Grasp Video Moments
BT-Adapter2.272.592.201.032.623.622.351.292.19BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
VideoChat22.272.512.131.262.423.132.431.662.20MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Video-ChatGPT2.062.462.070.892.423.602.251.392.08Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
0 of 6 row(s) selected.
Vcgbench Diverse On Videoinstruct | SOTA | HyperAI