2 months ago

Recomposer: Event-roll-guided generative audio editing

Daniel P. W. Ellis Eduardo Fonseca Ron J. Weiss Kevin Wilson Scott Wisdom et al

Abstract

Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior understanding of the data domain. We present a system for editing individual sound events within complex scenes able to delete, insert, and enhance individual sound events based on textual edit descriptions (e.g., enhance Door'') and a graphical representation of the event timing derived from anevent roll'' transcription. We present an encoder-decoder transformer working on SoundStream representations, trained on synthetic (input, desired output) audio example pairs formed by adding isolated sound events to dense, real-world backgrounds. Evaluation reveals the importance of each part of the edit descriptions -- action, class, timing. Our work demonstrates ``recomposition'' is an important and practical application.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Recomposer: Event-roll-guided generative audio editing

Daniel P. W. Ellis Eduardo Fonseca Ron J. Weiss Kevin Wilson Scott Wisdom et al

Abstract

Build AI with AI

Hyper Newsletters