8 months ago

Diffusion Model

Audio and Speech Processing

Method/Architecture

Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

Abstract

Recent progress in music generation has been remarkably advanced by thestate-of-the-art MusicLM, which comprises a hierarchy of three LMs,respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet,sampling with the MusicLM requires processing through these LMs one by one toobtain the fine-grained acoustic tokens, making it computationally expensiveand prohibitive for a real-time generation. Efficient music generation with aquality on par with MusicLM remains a significant challenge. In this paper, wepresent MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusionmodel that generates music audios of state-of-the-art quality meanwhilereducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM forsemantic modeling, and applies a novel dual-path diffusion (DPD) model and anaudio VAE-GAN to efficiently decode the conditioning semantic tokens intowaveform. DPD is proposed to simultaneously model the coarse and fine acousticsby incorporating the semantic information into segments of latents effectivelyvia cross-attention at each denoising step. Our experimental results suggestthe superiority of MeLoDy, not only in its practical advantages on samplingspeed and infinitely continuable generation, but also in its state-of-the-artmusicality, audio quality, and text correlation. Our samples are available at https://Efficient-MeLoDy.github.io/.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Diffusion Model

Audio and Speech Processing

Method/Architecture

Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

Abstract

Recent progress in music generation has been remarkably advanced by thestate-of-the-art MusicLM, which comprises a hierarchy of three LMs,respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet,sampling with the MusicLM requires processing through these LMs one by one toobtain the fine-grained acoustic tokens, making it computationally expensiveand prohibitive for a real-time generation. Efficient music generation with aquality on par with MusicLM remains a significant challenge. In this paper, wepresent MeLoDy (M for music; L for LM; D for diffusion), an LM-guided diffusionmodel that generates music audios of state-of-the-art quality meanwhilereducing 95.7% or 99.6% forward passes in MusicLM, respectively, for sampling10s or 30s music. MeLoDy inherits the highest-level LM from MusicLM forsemantic modeling, and applies a novel dual-path diffusion (DPD) model and anaudio VAE-GAN to efficiently decode the conditioning semantic tokens intowaveform. DPD is proposed to simultaneously model the coarse and fine acousticsby incorporating the semantic information into segments of latents effectivelyvia cross-attention at each denoising step. Our experimental results suggestthe superiority of MeLoDy, not only in its practical advantages on samplingspeed and infinitely continuable generation, but also in its state-of-the-artmusicality, audio quality, and text correlation. Our samples are available at https://Efficient-MeLoDy.github.io/.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp