HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation

Tal Or Kreuk Felix Adi Yossi

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling
  Paradigms for Text-to-Music Generation

Abstract

Recent progress in text-to-music generation has enabled models to synthesizehigh-quality musical segments, full compositions, and even respond tofine-grained control signals, e.g. chord progressions. State-of-the-art (SOTA)systems differ significantly across many dimensions, such as training datasets,modeling paradigms, and architectural choices. This diversity complicatesefforts to evaluate models fairly and pinpoint which design choices mostinfluence performance. While factors like data and architecture are important,in this study we focus exclusively on the modeling paradigm. We conduct asystematic empirical analysis to isolate its effects, offering insights intoassociated trade-offs and emergent behaviors that can guide futuretext-to-music generation systems. Specifically, we compare the two arguablymost common modeling paradigms: Auto-Regressive decoding and ConditionalFlow-Matching. We conduct a controlled comparison by training all models fromscratch using identical datasets, training configurations, and similar backbonearchitectures. Performance is evaluated across multiple axes, includinggeneration quality, robustness to inference configurations, scalability,adherence to both textual and temporally aligned conditioning, and editingcapabilities in the form of audio inpainting. This comparative study shedslight on distinct strengths and limitations of each paradigm, providingactionable insights that can inform future architectural and training decisionsin the evolving landscape of text-to-music generation. Audio sampled examplesare available at: https://huggingface.co/spaces/ortal1602/ARvsFM

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation | Papers | HyperAI