HyperAI

*This dataset supports online use.Click here to jump.

Paper2Video is the first benchmark dataset for paper and video pairing published by the National University of Singapore in 2025.Paper2Video: Automatic Video Generation from Scientific Papers", which aims to provide a standard benchmark and evaluation resource for the task of automatically generating presentation videos (including slides, subtitles, voice, and speaker avatars) from academic papers.

The dataset contains 101 paper-video pairs. Each paper averages approximately 28.7 pages, contains approximately 13,300 words, and 44.7 figures. Each video averages approximately 6 minutes and 15 seconds in length, ranging up to 14 minutes, and includes an average of 16 slides. In addition to the paper and video, each sample also includes paper metadata (including title, link, conference, and year), an image of the speaker, and a voice sample.

Data composition

Metadata file: Contains fields such as the paper title (paper), paper link (paper_link), presentation video link (presentation_link), conference name (conference), and year (year) of each sample.
Author identity files, which can be used for tasks such as personalized speaker synthesis, speaker rendering, and avatar video generation:
- Each author should include an identifying image (e.g., ref_img.png)
- Voice samples (such as ref_audio.wav)

Paper2Video Paper Video Benchmark Dataset

*This dataset supports online use.Click here to jump.

Data composition

Build AI with AI

Hyper Newsletters

Command Palette

Paper2Video Paper Video Benchmark Dataset

*This dataset supports online use.Click here to jump.

Data composition

Build AI with AI

Hyper Newsletters