HyperAIHyperAI

Command Palette

Search for a command to run...

Multimodal-Textbook-6.5M Multimodal Textbook Dataset

Date

4 months ago

Organization

Paper URL

arxiv.org

Join the Discord Community

This dataset is a multimodal textbook dataset released by Alibaba DAMO Academy in 2025. The relevant paper results are:2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining", which aims to enhance multimodal pre-training and expand the model's ability to handle interleaved visual and textual inputs.

The dataset contains 6.5 million images and 800 million text data from teaching videos. All images and texts are extracted from online teaching videos (22,000 class hours), covering six basic subjects such as mathematics, physics, and chemistry, providing a more coherent background and richer knowledge for image-text alignment.

Example of building a dataset from a tutorial video

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multimodal-Textbook-6.5M Multimodal Textbook Dataset | Datasets | HyperAI