5 months ago

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

Ma Hao ; Peng Zhiyuan ; Li Xu ; Shao Mingjie ; Wu Xixin ; Liu Ju

Abstract

Universal sound separation (USS) aims to extract arbitrary types of soundsfrom real-world recordings. This can be achieved by language-queried targetsound extraction (TSE), which typically consists of two components: a querynetwork that converts user queries into conditional embeddings, and aseparation network that extracts the target sound accordingly. Existing methodscommonly train models from scratch. As a consequence, substantial data andcomputational resources are required to make the randomly initialized modelcomprehend sound events and perform separation accordingly. In this paper, wepropose to integrate pre-trained models into TSE models to address the aboveissue. To be specific, we tailor and adapt the powerful contrastivelanguage-audio pre-trained model (CLAP) for USS, denoted as CLAPSep. CLAPSepalso accepts flexible user inputs, taking both positive and negative userprompts of uni- and/or multi-modalities for target sound extraction. These keyfeatures of CLAPSep can not only enhance the extraction performance but alsoimprove the versatility of its application. We provide extensive experiments on5 diverse datasets to demonstrate the superior performance and zero- andfew-shot generalizability of our proposed CLAPSep with fast trainingconvergence, surpassing previous methods by a significant margin. Full codesand some audio examples are released for reproduction and evaluation.

Code Repositories

aisaka0v0/clapsep

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
target-sound-extraction-on-audiocaps	CLAPSep	SDRi: 10.08 SI-SDRi: 9.40
target-sound-extraction-on-audioset	CLAPSep	SDRi: 9.29 SI-SDRi: 8.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette