HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Liu Shansong ; Hussain Atin Sakkeer ; Sun Chenshuo ; Shan Ying

Music Understanding LLaMA: Advancing Text-to-Music Generation with
  Question Answering and Captioning

Abstract

Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcityof large-scale publicly available music datasets with natural languagecaptions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA),capable of answering music-related questions and generating captions for musicfiles. Our model utilizes audio representations from a pretrained MERT model toextract music features. However, obtaining a suitable dataset for training theMU-LLaMA model remains challenging, as existing publicly accessible audioquestion answering datasets lack the necessary depth for open-ended musicquestion answering. To fill this gap, we present a methodology for generatingquestion-answer pairs from existing audio captioning datasets and introduce theMusicQA Dataset designed for answering open-ended music-related questions. Theexperiments demonstrate that the proposed MU-LLaMA model, trained on ourdesigned MusicQA dataset, achieves outstanding performance in both musicquestion answering and music caption generation across various metrics,outperforming current state-of-the-art (SOTA) models in both fields andoffering a promising advancement in the T2M-Gen research field.

Code Repositories

shansongliu/M2UGen
jax
Mentioned in GitHub
crypto-code/mu-llama
Official
pytorch
Mentioned in GitHub
shansongliu/MU-LLaMA
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
music-question-answering-on-musicqaMU-LLaMA
BERT Score: 0.901
BLEU: 0.306
METEOR: 0.385
ROUGE: 0.466

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning | Papers | HyperAI