HyperAIHyperAI

Command Palette

Search for a command to run...

ChineseWebText Chinese Web Text Dataset

Date

2 years ago

Size

398.86 GB

ChineseWebText is the latest and largest Chinese dataset, containing 1.42 TB of data.Each text is assigned a quality score, making it easier for large language model researchers to select data based on new quality thresholds. A cleaner subset containing 600 GB of Chinese text with quality exceeding 90% is also released here. This directory contains the ChineseWebText dataset and the EvalWeb toolchain for processing CommonCrawl data.

ChineseWebText.torrent
Seeding 1Downloading 0Completed 198Total Downloads 416
  • ChineseWebText/
    • README.md
      1.16 KB
    • README.txt
      2.32 KB
      • data/
        • C-webtexet.zip
          398.86 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ChineseWebText Chinese Web Text Dataset | Datasets | HyperAI