Command Palette
Search for a command to run...
Simple Voice Questions Dataset
Simple Voice Questions (SVQ) is a short audio dataset released by Google and is a core evaluation component of the Massive Sound Embedding Benchmark (MSEB). This dataset is a multilingual speech dataset containing short audio questions in 17 languages from 26 regions, totaling approximately 700 speakers. Each speaker provided up to 250 speech samples, covering multiple languages including Arabic, English, Japanese, Korean, and Hindi. It also includes diverse recording conditions such as quiet environments, background voices, and traffic noise. The data is labeled with the speakers' genders, including female, male, non-binary, and no-response categories, demonstrating high diversity in both language and acoustic scenarios.
Data fields:
- utt_id: A string representing a unique identifier for the recording.
- waveform: Audio type, sampling rate 16,000.
- locale: A string representing the recording region.
- speaker_id: A string representing a unique identifier for the speaker.
- speaker_age: A 32-bit integer representing the speaker's age.
- speaker_gender: A string representing the speaker's gender.
- environment: A string representing the recording environment.
- text: A string type representing the recorded text content.
- topk_salient_terms: A list of strings representing keywords.
- topk_salient_terms_timestamps: A list of floating-point numbers representing the timestamps of the keywords.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.