8 months ago

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Text-to-music (TTM) generation, which converts textual descriptions intoaudio, opens up innovative avenues for multimedia creation. Achieving highquality and diversity in this process demands extensive, high-quality data,which are often scarce in available datasets. Most open-source datasetsfrequently suffer from issues like low-quality waveforms and low text-audioconsistency, hindering the advancement of music generation models. To addressthese challenges, we propose a novel quality-aware training paradigm forgenerating high-quality, high-musicality music from large-scale,quality-imbalanced datasets. Additionally, by leveraging unique properties inthe latent space of musical signals, we adapt and implement a masked diffusiontransformer (MDT) model for the TTM task, showcasing its capacity for qualitycontrol and enhanced musicality. Furthermore, we introduce a three-stagecaption refinement approach to address low-quality captions' issue. Experimentsshow state-of-the-art (SOTA) performance on benchmark datasets includingMusicCaps and the Song-Describer Dataset with both objective and subjectivemetrics. Demo audio samples are available at https://qa-mdt.github.io/, codeand pretrained checkpoints are open-sourced athttps://github.com/ivcylc/OpenMusic.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Chang Li* Ruoyu Wang* Lijuan Liu Jun Du† Yixuan Sun Zilu Guo Zhengrong Zhang Yuan Jiang Jianqing Gao Feng Ma

Abstract

Build AI with AI

HyperAI Newsletters