Home Console Docs News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

AutoCaption Video Caption Benchmark Dataset

Date

3 months ago

Paper URL

License

Apache 2.0

Tags

Text Generation

Video subtitles

Join the Discord Community

The AutoCaption dataset is a video caption benchmark dataset released by Tjunlp Lab in 2025. The related paper results are "Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search", which aims to promote the research of multimodal large language models in the field of video subtitle generation.

Dataset structure:

The dataset contains 2 subsets, with a total of 11,184 samples:

sft_data: supervised fine-tuning for subtitle models (9,419 samples for supervised fine-tuning data)
mcts_vcb: Evaluated using MCTS-generated captions and keypoints (1,765 samples for evaluating the MCTS-VCB benchmark)

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

AutoCaption Video Caption Benchmark Dataset | Datasets | HyperAI