HyperAIHyperAI

Command Palette

Search for a command to run...

FLUX that Plays Music

Zhengcong Fei Mingyuan Fan Changqian Yu Junshi Huang

Abstract

This paper explores a simple extension of diffusion-based rectified flowTransformers for text-to-music generation, termed as FluxMusic. Generally,along with design in advancedFluxhttps://github.com/black-forest-labs/flux model, we transfers itinto a latent VAE space of mel-spectrum. It involves first applying a sequenceof independent attention to the double text-music stream, followed by a stackedsingle music stream for denoised patch prediction. We employ multiplepre-trained text encoders to sufficiently capture caption semantic informationas well as inference flexibility. In between, coarse textual information, inconjunction with time step embeddings, is utilized in a modulation mechanism,while fine-grained textual details are concatenated with the music patchsequence as inputs. Through an in-depth study, we demonstrate that rectifiedflow training with an optimized architecture significantly outperformsestablished diffusion methods for the text-to-music task, as evidenced byvarious automatic metrics and human preference evaluations. Our experimentaldata, code, and model weights are made publicly available at:https://github.com/feizc/FluxMusic.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp