HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music
  Generation

Abstract

Text-to-music (TTM) generation, which converts textual descriptions intoaudio, opens up innovative avenues for multimedia creation. Achieving highquality and diversity in this process demands extensive, high-quality data,which are often scarce in available datasets. Most open-source datasetsfrequently suffer from issues like low-quality waveforms and low text-audioconsistency, hindering the advancement of music generation models. To addressthese challenges, we propose a novel quality-aware training paradigm forgenerating high-quality, high-musicality music from large-scale,quality-imbalanced datasets. Additionally, by leveraging unique properties inthe latent space of musical signals, we adapt and implement a masked diffusiontransformer (MDT) model for the TTM task, showcasing its capacity for qualitycontrol and enhanced musicality. Furthermore, we introduce a three-stagecaption refinement approach to address low-quality captions' issue. Experimentsshow state-of-the-art (SOTA) performance on benchmark datasets includingMusicCaps and the Song-Describer Dataset with both objective and subjectivemetrics. Demo audio samples are available at https://qa-mdt.github.io/, codeand pretrained checkpoints are open-sourced athttps://github.com/ivcylc/OpenMusic.

Code Repositories

ivcylc/qa-mdt
Official
pytorch
Mentioned in GitHub
ivcylc/openmusic
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
music-generation-on-song-describer-datasetOpenMusic
FAD VGG: 1.01
text-to-music-generation-on-musiccapsOpenMusic (QA-MDT)
FAD: 1.65
IS: 2.80
KL_passt: 1.31

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation | Papers | HyperAI