Command Palette
Search for a command to run...
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network
Gazneli Avi ; Zimerman Gadi ; Ridnik Tal ; Sharir Gilad ; Noy Asaf

Abstract
While efficient architectures and a plethora of augmentations for end-to-endimage classification tasks have been suggested and heavily investigated,state-of-the-art techniques for audio classifications still rely on numerousrepresentations of the audio signal together with large architectures,fine-tuned from large datasets. By utilizing the inherited lightweight natureof audio and novel audio augmentations, we were able to present an efficientend-to-end network with strong generalization ability. Experiments on a varietyof sound classification sets demonstrate the effectiveness and robustness ofour approach, by achieving state-of-the-art results in various settings. Publiccode is available at:\href{https://github.com/Alibaba-MIIL/AudioClassfication}{this http url}
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-classification-on-audioset | EAT-S | Test mAP: 0.405 |
| audio-classification-on-audioset | EAT-M | Test mAP: 0.426 |
| audio-classification-on-esc-50 | EAT-S | Accuracy (5-fold): 95.25 PRE-TRAINING DATASET: AudioSet Top-1 Accuracy: 95.25 |
| audio-classification-on-esc-50 | EAT-M | Accuracy (5-fold): 96.3 PRE-TRAINING DATASET: AudioSet Top-1 Accuracy: 96.3 |
| audio-classification-on-esc-50 | EAT-S (scratch) | Accuracy (5-fold): 92.15 Top-1 Accuracy: 92.15 |
| keyword-spotting-on-google-speech-commands | EAT-S | Google Speech Commands V2 35: 98.15 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.