HyperAIHyperAI

Command Palette

Search for a command to run...

Text-to-Image-2M text-to-image Training Dataset

Date

4 months ago

License

MIT

Join the Discord Community

Text-to-Image-2M is a high-quality text-image pair dataset designed for fine-tuning text-to-image models. Existing public datasets often have limitations (image understanding datasets, informally collected or task-specific datasets, and size limitations). To address these issues, the team combined and enhanced existing high-quality datasets with advanced text-to-image and captioning models to create the Text-to-Image-2M dataset.

The dataset contains about 2 million samples, divided into 2 core subsets: data_512_2M (2 million 512×512 resolution images and annotations) and data_1024_10K (10,000 1024×1024 high-resolution images and annotations), providing flexible options for model training with different accuracy requirements.

Data composition:

  • data_512_2M:
    • LLaVA-next fine-tuning dataset (about 700,000 samples, text is regenerated by Qwen2-VL to improve accuracy)
    • LLaVA pre-trained dataset (about 500,000 samples, images are generated by Flux-dev model, and original text descriptions are retained)
    • ProGamerGov synthetic dataset (~900k samples, center-cropped and validity-filtered)
    • GPT-4o generated dataset (100,000 samples, text designed by GPT-4o, images generated by Flux-dev)
  • data_1024_10K:
    • Contains 10,000 high-resolution images, with text generated by GPT-4o and images rendered by the Flux-dev model, focusing on complex scenes with rich details

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Text-to-Image-2M text-to-image Training Dataset | Datasets | HyperAI