HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

Cai Pengfei ; Song Yan ; Li Kang ; Song Haoyu ; McLoughlin Ian

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based
  Pre-training for Sound Event Detection

Abstract

Sound event detection (SED) methods that leverage a large pre-trainedTransformer encoder network have shown promising performance in recent DCASEchallenges. However, they still rely on an RNN-based context network to modeltemporal dependencies, largely due to the scarcity of labeled data. In thiswork, we propose a pure Transformer-based SED model with masked-reconstructionbased pre-training, termed MAT-SED. Specifically, a Transformer with relativepositional encoding is first designed as the context network, pre-trained bythe masked-reconstruction task on all available target data in aself-supervised way. Both the encoder and the context network are jointlyfine-tuned in a semi-supervised manner. Furthermore, a global-local featurefusion strategy is proposed to enhance the localization capability. Evaluationof MAT-SED on DCASE2023 task4 surpasses state-of-the-art performance, achieving0.587/0.896 PSDS1/PSDS2 respectively.

Code Repositories

cai525/transformer4sed
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sound-event-detection-on-desedMAT-SED
PSDS1: 0.587
PSDS2: 0.896

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection | Papers | HyperAI