5 months ago

Dual Knowledge Distillation for Efficient Sound Event Detection

Xiao Yang ; Das Rohan Kumar

Abstract

Sound event detection (SED) is essential for recognizing specific sounds andtheir temporal locations within acoustic signals. This becomes challengingparticularly for on-device applications, where computational resources arelimited. To address this issue, we introduce a novel framework referred to asdual knowledge distillation for developing efficient SED systems in this work.Our proposed dual knowledge distillation commences with temporal-averagingknowledge distillation (TAKD), utilizing a mean student model derived from thetemporal averaging of the student model's parameters. This allows the studentmodel to indirectly learn from a pre-trained teacher model, ensuring a stableknowledge distillation. Subsequently, we introduce embedding-enhanced featuredistillation (EEFD), which involves incorporating an embedding distillationlayer within the student model to bolster contextual learning. On DCASE 2023Task 4A public evaluation dataset, our proposed SED system with dual knowledgedistillation having merely one-third of the baseline model's parameters,demonstrates superior performance in terms of PSDS1 and PSDS2. This highlightsthe importance of proposed dual knowledge distillation for compact SED systems,which can be ideal for edge devices.

Benchmarks

Benchmark	Methodology	Metrics
sound-event-detection-on-desed	SE-CRNN-16 with DualKD	PSDS1: 0.474 PSDS2: 0.698 event-based F1 score: 55.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning