HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

Ma Hao ; Peng Zhiyuan ; Li Xu ; Shao Mingjie ; Wu Xixin ; Liu Ju

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal
  Query-Conditioned Target Sound Extraction

Abstract

Universal sound separation (USS) aims to extract arbitrary types of soundsfrom real-world recordings. This can be achieved by language-queried targetsound extraction (TSE), which typically consists of two components: a querynetwork that converts user queries into conditional embeddings, and aseparation network that extracts the target sound accordingly. Existing methodscommonly train models from scratch. As a consequence, substantial data andcomputational resources are required to make the randomly initialized modelcomprehend sound events and perform separation accordingly. In this paper, wepropose to integrate pre-trained models into TSE models to address the aboveissue. To be specific, we tailor and adapt the powerful contrastivelanguage-audio pre-trained model (CLAP) for USS, denoted as CLAPSep. CLAPSepalso accepts flexible user inputs, taking both positive and negative userprompts of uni- and/or multi-modalities for target sound extraction. These keyfeatures of CLAPSep can not only enhance the extraction performance but alsoimprove the versatility of its application. We provide extensive experiments on5 diverse datasets to demonstrate the superior performance and zero- andfew-shot generalizability of our proposed CLAPSep with fast trainingconvergence, surpassing previous methods by a significant margin. Full codesand some audio examples are released for reproduction and evaluation.

Code Repositories

aisaka0v0/clapsep
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
target-sound-extraction-on-audiocapsCLAPSep
SDRi: 10.08
SI-SDRi: 9.40
target-sound-extraction-on-audiosetCLAPSep
SDRi: 9.29
SI-SDRi: 8.44

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction | Papers | HyperAI