Command Palette
Search for a command to run...
AutoCaption Video Caption Benchmark Dataset
The AutoCaption dataset is a video caption benchmark dataset released by Tjunlp Lab in 2025. The related paper results are "Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search", which aims to promote the research of multimodal large language models in the field of video subtitle generation.
Dataset structure:
The dataset contains 2 subsets, with a total of 11,184 samples:
- sft_data: supervised fine-tuning for subtitle models (9,419 samples for supervised fine-tuning data)
 - mcts_vcb: Evaluated using MCTS-generated captions and keypoints (1,765 samples for evaluating the MCTS-VCB benchmark)
 
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.
AI Co-coding
Ready-to-use GPUs
Best Pricing
Hyper Newsletters
Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning 
Powered by  MailChimp