Command Palette
Search for a command to run...
AudioSetCaps Audio Subtitle Dataset
Date
Size
Publish URL
Paper URL
License
CC BY 4.0
The dataset was released in 2024 by researchers from Northwestern Polytechnical University, Xi'an Lianfeng Acoustic Technology Co., Ltd., Nanyang Technological University, University of Surrey, and the Institute of Acoustics, Chinese Academy of Sciences.AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models", has been accepted by NeurIPS 24.
AudioSetCaps is an audio-caption dataset containing 6,117,099 10-second audio files. Each audio file is accompanied by a descriptive title and 3 Q&A pairs as metadata for generating the final caption (a total of 18,414,789 pairs of Q&A data).
It is created using an automated generation pipeline of large audio and language models using data from three audio datasets: AudioSet, YouTube-8M, and VGGSound.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.