Command Palette
Search for a command to run...
Institutional Books 1.0 Book Dataset
Institutional Books 1.0 is a growing corpus of public domain books to be released by Harvard University in 2025. The related paper results are:Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability".
The dataset consists of 983,004 public domain books in 254 languages, mainly published in the 19th and 20th centuries. The dataset has 242 billion tokens, 386 million pages of text, and is available in both original and post-processed OCR export formats.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.
AI Co-coding
Ready-to-use GPUs
Best Pricing
Hyper Newsletters
Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp