Command Palette
Search for a command to run...
Free Spoken Digit Dataset (FSDD) Digital Recognition Audio Dataset
Date
Size
Publish URL
License
CC BY-SA 4.0
The Free Spoken Digit Dataset (FSDD) is a simple audio/speech dataset consisting of digital speech recordings in wav files with a sampling rate of 8kHz. The recordings have been cropped to minimize silence at the beginning and end. The dataset is open, meaning it will grow over time as data continues to be contributed.
The FSDD dataset currently includes (as of July 2024):
- 6 different speakers
- 3,000 recordings (50 per speaker)
- English Pronunciation
The files in the dataset are named according to a specific format, for example:{digitLabel}_{speakerName}_{index}.wav For example, the file name 7_jackson_32.wav Indicates the 32nd recording of number 7 by speaker jackson.
The FSDD dataset is not only available for academic research, but the community is also encouraged to contribute their own recordings. All recordings should be mono 8kHz wav files and cropped to minimize silence.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.