HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

VideoChat: Chat-Centric Video Understanding

KunChang Li; Yinan He; Yi Wang; Yizhuo Li; Wenhai Wang; Ping Luo; Yali Wang; Limin Wang; Yu Qiao

VideoChat: Chat-Centric Video Understanding

Abstract

In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we build a video-centric instruction dataset, composed of thousands of videos associated with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and captures causal relationships, providing a valuable asset for training our chat-centric video understanding system. Preliminary qualitative experiments demonstrate the potential of our system across a broad spectrum of video applications, which could serve as a simple prototype system for future research on chat-centric video understanding. Access our code and data at https://github.com/OpenGVLab/Ask-Anything

Code Repositories

opengvlab/ask-anything
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-next-qa-open-endedVideoChat
Accuracy: 56.6
Confidence Score: 3.2
video-based-generative-performanceVideo Chat
Consistency: 2.24
Contextual Understanding: 2.53
Correctness of Information: 2.23
Detail Orientation: 2.50
Temporal Understanding: 1.94
mean: 2.29
video-based-generative-performance-1Video Chat
gpt-score: 2.32
video-based-generative-performance-2Video Chat
gpt-score: 2.24
video-based-generative-performance-3Video Chat
gpt-score: 2.53
video-based-generative-performance-4Video Chat
gpt-score: 2.50
video-based-generative-performance-5Video Chat
gpt-score: 1.94
video-question-answering-on-activitynet-qaVideo Chat
Accuracy: 26.5
Confidence score: 2.2
video-question-answering-on-mvbenchVideoChat
Avg.: 35.5
zeroshot-video-question-answer-on-activitynetVideo Chat
Accuracy: 26.5
Confidence Score: 2.2
zeroshot-video-question-answer-on-msrvtt-qaVideo Chat-7B
Accuracy: 45.0
Confidence Score: 2.5
zeroshot-video-question-answer-on-msvd-qaVideo Chat-7B
Accuracy: 56.3
Confidence Score: 2.8
zeroshot-video-question-answer-on-tgif-qaVideo Chat-7B
Accuracy: 34.4
Confidence Score: 2.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VideoChat: Chat-Centric Video Understanding | Papers | HyperAI