Command Palette
Search for a command to run...
Paper2Video Paper Video Benchmark Dataset
Date
Size
Paper URL
License
MIT
*This dataset supports online use.Click here to jump.
Paper2Video is the first benchmark dataset for paper and video pairing published by the National University of Singapore in 2025.Paper2Video: Automatic Video Generation from Scientific Papers", which aims to provide a standard benchmark and evaluation resource for the task of automatically generating presentation videos (including slides, subtitles, voice, and speaker avatars) from academic papers.
The dataset contains 101 paper-video pairs. Each paper averages approximately 28.7 pages, contains approximately 13,300 words, and 44.7 figures. Each video averages approximately 6 minutes and 15 seconds in length, ranging up to 14 minutes, and includes an average of 16 slides. In addition to the paper and video, each sample also includes paper metadata (including title, link, conference, and year), an image of the speaker, and a voice sample.
Data composition
- Metadata file: Contains fields such as the paper title (paper), paper link (paper_link), presentation video link (presentation_link), conference name (conference), and year (year) of each sample.
 - Author identity files, which can be used for tasks such as personalized speaker synthesis, speaker rendering, and avatar video generation:
- Each author should include an identifying image (e.g., ref_img.png)
 - Voice samples (such as ref_audio.wav)
 
 
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.