HyperAIHyperAI

Command Palette

Search for a command to run...

Institutional Books 1.0 Book Dataset

Date

5 months ago

Organization

Paper URL

arxiv.org

Join the Discord Community

Institutional Books 1.0 is a growing corpus of public domain books to be released by Harvard University in 2025. The related paper results are:Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability".

The dataset consists of 983,004 public domain books in 254 languages, mainly published in the 19th and 20th centuries. The dataset has 242 billion tokens, 386 million pages of text, and is available in both original and post-processed OCR export formats.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Institutional Books 1.0 Book Dataset | Datasets | HyperAI