HyperAIHyperAI

Command Palette

Search for a command to run...

TinyStories Short Story Synthesis Dataset

Date

a year ago

Size

4.21 GB

Organization

Microsoft Research

Paper URL

arxiv.org

* This dataset supports online use.Click here to jump.

This dataset is a synthetic dataset of short stories generated by GPT-3.5 and GPT-4, containing a vocabulary limited to the range of 3 to 4-year-old children's understanding. It is designed for training and evaluating small language models (LMs), and despite the small size of the model (less than 5 million parameters) or simpler architecture (only one transformer block), the model trained using this dataset is still able to produce fluent, consistent, diverse and grammatically perfect short stories.

The TinyStories dataset was proposed by Microsoft Research in 2023, and the relevant paper is “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?"

TinyStories.torrent
Seeding 1Downloading 0Completed 176Total Downloads 376
  • TinyStories/
    • README.md
      1.36 KB
    • README.txt
      2.72 KB
      • data/
        • TinyStories.zip
          4.21 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TinyStories Short Story Synthesis Dataset | Datasets | HyperAI