HyperAIHyperAI

Command Palette

Search for a command to run...

ChildMandarin Children's Chinese Conversation Speech Dataset

Date

6 months ago

Size

3.4 GB

Organization

Publish URL

github.com

Paper URL

arxiv.org

The ChildMandarin dataset is a comprehensive Mandarin speech dataset for children aged 3 to 5 years old, released in 2025 by the Zhiyuan Research Institute and the Human Language Technology Laboratory (HLT Lab) of the School of Computer Science at Nankai University. This dataset is designed to solve the problem of scarcity of Mandarin speech data for this age group. The relevant paper results are:ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5", which aims to support the development of related research fields such as children's speech recognition and speaker verification.

Dataset features:

  • Large data size: 397 children, totaling 41.25 hours of conversational speech from 3-5 years old, which has certain advantages among similar data sets
  • Wide geographical coverage: Data is collected from 22 provinces and cities, ensuring regional diversity and covering different accents and speech habits
  • Natural and realistic interaction: The collection method of parent-guided dialogue is adopted to simulate natural communication scenes and make the voice more realistic.

ChildMandarin.torrent
Seeding 2Downloading 0Completed 84Total Downloads 205
  • ChildMandarin/
    • README.md
      1.64 KB
    • README.txt
      3.27 KB
      • data/
        • ChildMandarin.zip
          3.4 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ChildMandarin Children's Chinese Conversation Speech Dataset | Datasets | HyperAI