5 months ago

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

Schmid Florian ; Koutini Khaled ; Widmer Gerhard

Abstract

The introduction of large-scale audio datasets, such as AudioSet, paved theway for Transformers to conquer the audio domain and replace CNNs as thestate-of-the-art neural network architecture for many tasks. Audio SpectrogramTransformers are excellent at exploiting large datasets, creating powerfulpre-trained models that surpass CNNs when fine-tuned on downstream tasks.However, current popular Audio Spectrogram Transformers are demanding in termsof computational complexity compared to CNNs. Recently, we have shown that, byemploying Transformer-to-CNN Knowledge Distillation, efficient CNNs can catchup with and even outperform Transformers on large datasets. In this work, weextend this line of research and increase the capacity of efficient CNNs byintroducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamicconvolutions and attention mechanisms. We show that these dynamic CNNsoutperform traditional efficient CNNs, in terms of the performance-complexitytrade-off and parameter efficiency, at the task of audio tagging on thelarge-scale AudioSet. Our experiments further indicate that the introduceddynamic CNNs achieve better performance on downstream tasks and scale up well,attaining Transformer performance and even outperforming them on AudioSet andseveral downstream tasks.

Code Repositories

fschmid56/efficientat

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
audio-classification-on-audioset	DyMN-L (Audio-Only, Single)	Test mAP: 0.490
audio-classification-on-esc-50	DyMN-L	Accuracy (5-fold): 97.4 PRE-TRAINING DATASET: AudioSet Top-1 Accuracy: 97.4
audio-classification-on-fsd50k	MN	mAP: 65.6
audio-classification-on-fsd50k	DyMN-L	mAP: 65.5
audio-tagging-on-audioset	DyMN-L (Audio-Only, Single)	mean average precision: 0.490
instrument-recognition-on-openmic-2018	DyMN-L	mean average precision: 0.855

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette