HyperAIHyperAI

Command Palette

Search for a command to run...

AudioSetCaps Audio Subtitle Dataset

Date

a year ago

Size

120.7 MB

Organization

Nanyang Technological University
Northwestern Polytechnical University
University of Surrey

Publish URL

github.com

License

CC BY 4.0

The dataset was released in 2024 by researchers from Northwestern Polytechnical University, Xi'an Lianfeng Acoustic Technology Co., Ltd., Nanyang Technological University, University of Surrey, and the Institute of Acoustics, Chinese Academy of Sciences.AudioSetCaps: Enriched Audio Captioning Dataset Generation Using Large Audio Language Models", has been accepted by NeurIPS 24.

AudioSetCaps is an audio-caption dataset containing 6,117,099 10-second audio files. Each audio file is accompanied by a descriptive title and 3 Q&A pairs as metadata for generating the final caption (a total of 18,414,789 pairs of Q&A data).

It is created using an automated generation pipeline of large audio and language models using data from three audio datasets: AudioSet, YouTube-8M, and VGGSound.

AudioSetCaps.torrent
Seeding 1Downloading 0Completed 104Total Downloads 166
  • AudioSetCaps/
    • README.md
      1.63 KB
    • README.txt
      3.27 KB
      • data/
        • AudioSetCaps.zip
          120.7 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AudioSetCaps Audio Subtitle Dataset | Datasets | HyperAI