HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MetaAudio: A Few-Shot Audio Classification Benchmark

Heggan Calum ; Budgett Sam ; Hospedales Timothy ; Yaghoobi Mehrdad

MetaAudio: A Few-Shot Audio Classification Benchmark

Abstract

Currently available benchmarks for few-shot learning (machine learning withfew training examples) are limited in the domains they cover, primarilyfocusing on image classification. This work aims to alleviate this reliance onimage-based benchmarks by offering the first comprehensive, public and fullyreproducible audio based alternative, covering a variety of sound domains andexperimental settings. We compare the few-shot classification performance of avariety of techniques on seven audio datasets (spanning environmental sounds tohuman-speech). Extending this, we carry out in-depth analyses of joint training(where all datasets are used during training) and cross-dataset adaptationprotocols, establishing the possibility of a generalised audio few-shotclassification algorithm. Our experimentation shows gradient-basedmeta-learning methods such as MAML and Meta-Curvature consistently outperformboth metric and baseline methods. We also demonstrate that the joint trainingroutine helps overall generalisation for the environmental sound databasesincluded, as well as being a somewhat-effective method of tackling thecross-dataset/domain setting.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
few-shot-audio-classification-onMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 40.27 +- 0.44
few-shot-audio-classification-onMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 43.45 +- 0.46
few-shot-audio-classification-onSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 38.78 +- 0.41
few-shot-audio-classification-onPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 39.44 +- 0.44
few-shot-audio-classification-onMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 43.18 +- 0.45
few-shot-audio-classification-onSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 33.52 +- 0.39
few-shot-audio-classification-onSimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 42.05 +- 0.42
few-shot-audio-classification-on-birdclefPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 56.11 +- 0.46
few-shot-audio-classification-on-birdclefSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 36.41 +- 0.42
few-shot-audio-classification-on-birdclefSimpleShot Cl2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 57.66 +- 0.43
few-shot-audio-classification-on-birdclefSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 33.04 +- 0.41
few-shot-audio-classification-on-birdclefMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 56.26 +- 0.45
few-shot-audio-classification-on-birdclefMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 61.34 +- 0.46
few-shot-audio-classification-on-birdclefMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 57.28 +- 0.41
few-shot-audio-classification-on-esc-50SimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 60.41 +- 0.41
few-shot-audio-classification-on-esc-50Prototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 68.83 +- 0.38
few-shot-audio-classification-on-esc-50Meta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 76.17 +- 0.41
few-shot-audio-classification-on-esc-50SimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 64.48 +- 0.41
few-shot-audio-classification-on-esc-50MAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 74.66 ± 0.42
few-shot-audio-classification-on-esc-50SimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 68.82 +-0.39
few-shot-audio-classification-on-esc-50Meta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 71.72 +- 0.38
few-shot-audio-classification-on-nsynthMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 93.85 +- 0.24
few-shot-audio-classification-on-nsynthSimpleShot CL2N Classifier (AST pre-trained w/ ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 66.68 +- 0.41
few-shot-audio-classification-on-nsynthMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 90.74 +- 0.25
few-shot-audio-classification-on-nsynthSimpleShot CL2N Classifier (AST ImageNet & AudioSet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 63.78 +- 0.42
few-shot-audio-classification-on-nsynthSimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 90.04 +- 0.27
few-shot-audio-classification-on-nsynthMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 96.47 +-0.19
few-shot-audio-classification-on-nsynthPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 95.23 +- 0.19
few-shot-audio-classification-on-voxceleb1Meta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 63.85 +- 0.44
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 28.09 +- 0.37
few-shot-audio-classification-on-voxceleb1Prototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 59.64 +- 0.44
few-shot-audio-classification-on-voxceleb1Meta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 55.54 +- 0.42
few-shot-audio-classification-on-voxceleb1MAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 60.89 +- 0.45
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 48.50 +- 0.42
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 28.79 +- 0.38
few-shot-audio-classification-on-watkinsSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 51.81 ± 0.42
few-shot-audio-classification-on-watkinsSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 55.40 ± 0.42

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MetaAudio: A Few-Shot Audio Classification Benchmark | Papers | HyperAI