Command Palette
Search for a command to run...
Copet Jade ; Kreuk Felix ; Gat Itai ; Remez Tal ; Kant David ; Synnaeve Gabriel ; Adi Yossi ; Défossez Alexandre

Abstract
We tackle the task of conditional music generation. We introduce MusicGen, asingle Language Model (LM) that operates over several streams of compresseddiscrete music representation, i.e., tokens. Unlike prior work, MusicGen iscomprised of a single-stage transformer LM together with efficient tokeninterleaving patterns, which eliminates the need for cascading several models,e.g., hierarchically or upsampling. Following this approach, we demonstrate howMusicGen can generate high-quality samples, both mono and stereo, while beingconditioned on textual description or melodic features, allowing bettercontrols over the generated output. We conduct extensive empirical evaluation,considering both automatic and human studies, showing the proposed approach issuperior to the evaluated baselines on a standard text-to-music benchmark.Through ablation studies, we shed light over the importance of each of thecomponents comprising MusicGen. Music samples, code, and models are availableat https://github.com/facebookresearch/audiocraft
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| text-to-music-generation-on-musiccaps | MusicGen w/ random melody (1.5B) | FAD: 5.0 KL_passt: 1.31 |
| text-to-music-generation-on-musiccaps | MusicGen w/o melody (3.3B) | FAD: 3.8 FD_openl3: 197.12 KL_passt: 1.31 |
| text-to-music-generation-on-musiccaps | MusicGen w/o melody (1.5B) | FAD: 3.4 KL_passt: 1.23 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.