HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Efficient Neural Music Generation

Efficient Neural Music Generation

Abstract

Recent progress in music generation has been remarkably advanced by thestate-of-the-art MusicLM, which comprises a hierarchy of three LMs,respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet,sampling with the MusicLM requires processing through these LMs one by one toobtain the fine-grained acoustic tokens, making it computationally expensiveand prohibitive for a real-time generation. Efficient music generation with aquality on par with MusicLM remains a significant challenge. In this paper, wepresent MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusionmodel that generates music audios of state-of-the-art quality meanwhilereducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM forsemantic modeling, and applies a novel dual-path diffusion (DPD) model and anaudio VAE-GAN to efficiently decode the conditioning semantic tokens intowaveform. DPD is proposed to simultaneously model the coarse and fine acousticsby incorporating the semantic information into segments of latents effectivelyvia cross-attention at each denoising step. Our experimental results suggestthe superiority of MeLoDy, not only in its practical advantages on samplingspeed and infinitely continuable generation, but also in its state-of-the-artmusicality, audio quality, and text correlation. Our samples are available at https://Efficient-MeLoDy.github.io/.

Benchmarks

BenchmarkMethodologyMetrics
text-to-music-generation-on-musiccapsMeLoDy
FAD: 5.41

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Efficient Neural Music Generation | Papers | HyperAI