HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Video Instruction Tuning With Synthetic Data

Yuanhan Zhang Jinming Wu Wei Li Bo Li Zejun Ma Ziwei Liu Chunyuan Li

Video Instruction Tuning With Synthetic Data

Abstract

The development of video large multimodal models (LMMs) has been hindered bythe difficulty of curating large amounts of high-quality raw data from the web.To address this, we propose an alternative approach by creating a high-qualitysynthetic dataset specifically for video instruction-following, namelyLLaVA-Video-178K. This dataset includes key tasks such as detailed captioning,open-ended question-answering (QA), and multiple-choice QA. By training on thisdataset, in combination with existing visual instruction tuning data, weintroduce LLaVA-Video, a new video LMM. Our experiments demonstrate thatLLaVA-Video achieves strong performance across various video benchmarks,highlighting the effectiveness of our dataset. We plan to release the dataset,its generation pipeline, and the model checkpoints.

Benchmarks

BenchmarkMethodologyMetrics
video-question-answering-on-next-qaLLaVA-Video
Accuracy: 83.2
video-question-answering-on-tvbenchLLaVA-Video 7B
Average Accuracy: 45.6
video-question-answering-on-tvbenchLLaVA-Video 72B
Average Accuracy: 50.0
visual-question-answering-vqa-on-vlm2-benchLLaVA-Video-7B
Average Score on VLM2-bench (9 subtasks): 43.32
GC-mat: 18.53
GC-trk: 12.79
OC-cnt: 62.47
OC-cpr: 54.72
OC-grp: 28.50
PC-VID: 59.00
PC-cnt: 66.91
PC-cpr: 62.00
PC-grp: 25.00
zero-shot-video-question-answer-on-zero-shotLLaVA-Video
Accuracy (% ): 61.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Video Instruction Tuning With Synthetic Data | Papers | HyperAI