4 个月前

MetaAudio:少样本音频分类基准测试

MetaAudio:少样本音频分类基准测试

摘要

目前可用的少样本学习(基于少量训练样本的机器学习)基准测试在涵盖的领域上存在局限性,主要集中在图像分类。本研究旨在通过提供首个全面、公开且完全可复现的音频基准测试来缓解对图像基准测试的依赖,该基准测试覆盖了多种声音领域和实验设置。我们比较了多种技术在七个音频数据集上的少样本分类性能(这些数据集涵盖了从环境声音到人类语音的各种类型)。在此基础上,我们对联合训练(即所有数据集均用于训练过程)和跨数据集适应协议进行了深入分析,证明了通用音频少样本分类算法的可能性。我们的实验结果表明,基于梯度的元学习方法如MAML和Meta-Curvature在性能上始终优于度量方法和基线方法。此外,我们还展示了联合训练程序有助于提高所包含的环境声音数据库的整体泛化能力,并且在一定程度上也是解决跨数据集/领域问题的有效方法。

代码仓库

基准测试

基准方法指标
few-shot-audio-classification-onMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 40.27 +- 0.44
few-shot-audio-classification-onMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 43.45 +- 0.46
few-shot-audio-classification-onSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 38.78 +- 0.41
few-shot-audio-classification-onPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 39.44 +- 0.44
few-shot-audio-classification-onMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 43.18 +- 0.45
few-shot-audio-classification-onSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 33.52 +- 0.39
few-shot-audio-classification-onSimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 42.05 +- 0.42
few-shot-audio-classification-on-birdclefPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 56.11 +- 0.46
few-shot-audio-classification-on-birdclefSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 36.41 +- 0.42
few-shot-audio-classification-on-birdclefSimpleShot Cl2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 57.66 +- 0.43
few-shot-audio-classification-on-birdclefSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 33.04 +- 0.41
few-shot-audio-classification-on-birdclefMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 56.26 +- 0.45
few-shot-audio-classification-on-birdclefMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 61.34 +- 0.46
few-shot-audio-classification-on-birdclefMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 57.28 +- 0.41
few-shot-audio-classification-on-esc-50SimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 60.41 +- 0.41
few-shot-audio-classification-on-esc-50Prototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 68.83 +- 0.38
few-shot-audio-classification-on-esc-50Meta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 76.17 +- 0.41
few-shot-audio-classification-on-esc-50SimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 64.48 +- 0.41
few-shot-audio-classification-on-esc-50MAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 74.66 ± 0.42
few-shot-audio-classification-on-esc-50SimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 68.82 +-0.39
few-shot-audio-classification-on-esc-50Meta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 71.72 +- 0.38
few-shot-audio-classification-on-nsynthMAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 93.85 +- 0.24
few-shot-audio-classification-on-nsynthSimpleShot CL2N Classifier (AST pre-trained w/ ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 66.68 +- 0.41
few-shot-audio-classification-on-nsynthMeta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 90.74 +- 0.25
few-shot-audio-classification-on-nsynthSimpleShot CL2N Classifier (AST ImageNet & AudioSet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 63.78 +- 0.42
few-shot-audio-classification-on-nsynthSimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 90.04 +- 0.27
few-shot-audio-classification-on-nsynthMeta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 96.47 +-0.19
few-shot-audio-classification-on-nsynthPrototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 95.23 +- 0.19
few-shot-audio-classification-on-voxceleb1Meta-Curvature (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 63.85 +- 0.44
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 28.09 +- 0.37
few-shot-audio-classification-on-voxceleb1Prototypical Networks (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 59.64 +- 0.44
few-shot-audio-classification-on-voxceleb1Meta-Baseline (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 55.54 +- 0.42
few-shot-audio-classification-on-voxceleb1MAML (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 60.89 +- 0.45
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (CRNN)
Top-1 Accuracy(5-Way-1-Shot): 48.50 +- 0.42
few-shot-audio-classification-on-voxceleb1SimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 28.79 +- 0.38
few-shot-audio-classification-on-watkinsSimpleShot CL2N (AST ImageNet & AudioSet- No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 51.81 ± 0.42
few-shot-audio-classification-on-watkinsSimpleShot CL2N (AST ImageNet - No fine-tune)
Top-1 Accuracy(5-Way-1-Shot): 55.40 ± 0.42

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
MetaAudio:少样本音频分类基准测试 | 论文 | HyperAI超神经