Command Palette
Search for a command to run...
MusicPile Large Music Dataset
MusicPile is a large-scale music-language pre-training dataset jointly launched by the Multimodal Art Projection Research Community, Skywork AI, and the Hong Kong University of Science and Technology. The dataset contains 5.17 million samples and approximately 4.16 billion tokens, from sources including online corpora, encyclopedias, music books, YouTube music subtitles, ABC notation works, mathematical content, and code. The dataset contains three fields: id, text, and src, and each text has no more than 2,048 tokens. MusicPile covers a wide range of music common sense, knowledge questions and answers, and typical music theory content, which plays a key role in improving the music understanding and creation capabilities of large models.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.