4 个月前

MT-SLVR:多任务自监督学习以生成变换不变表示

MT-SLVR:多任务自监督学习以生成变换不变表示

摘要

对比自监督学习因其能够从大规模无标签数据集中生成高质量表示而受到关注。这些强大的特征之所以能够实现下游任务的数据高效学习,关键在于它们提供了增强不变性,这通常是一种有用的归纳偏置。然而,不同下游任务所偏好不变性的数量和类型在事先并不确定,并且会有所不同。因此,我们提出了一种多任务自监督框架(MT-SLVR),该框架以参数高效的方式同时学习变化特征和不变特征。我们的多任务表示提供了一个强大且灵活的特征集,有助于多种下游任务。我们在来自不同音频领域的少样本分类任务中评估了我们的方法,并展示了在所有这些任务上的分类性能均有提升。

代码仓库

cheggan/mt-slvr
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
few-shot-audio-classification-onMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 39.11±0.41
few-shot-audio-classification-onSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 37.64±0.40
few-shot-audio-classification-onMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 21.72±0.34
few-shot-audio-classification-on-birdclefMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 29.49±0.38
few-shot-audio-classification-on-birdclefSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 30.93±0.38
few-shot-audio-classification-on-birdclefMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 21.04±0.35
few-shot-audio-classification-on-common-voiceMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 23.00±0.42
few-shot-audio-classification-on-common-voiceMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 35.22±0.40
few-shot-audio-classification-on-common-voiceSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 33.33±0.38
few-shot-audio-classification-on-crema-dMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 21.68±0.33
few-shot-audio-classification-on-crema-dMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 29.61±0.38
few-shot-audio-classification-on-crema-dSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 29.10±0.36
few-shot-audio-classification-on-esc-50SimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 63.40±0.39
few-shot-audio-classification-on-esc-50MT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 69.53±0.39
few-shot-audio-classification-on-esc-50Multi-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 37.76±0.34
few-shot-audio-classification-on-nsynthMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 71.81±0.39
few-shot-audio-classification-on-nsynthSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 66.44±0.40
few-shot-audio-classification-on-nsynthMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 62.52±0.36
few-shot-audio-classification-on-speechMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 20.08±0.37
few-shot-audio-classification-on-speechSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 25.68±0.35
few-shot-audio-classification-on-speechMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 23.65±0.34
few-shot-audio-classification-on-speech-1MT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 28.92±0.37
few-shot-audio-classification-on-speech-1Multi-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 23.08±0.34
few-shot-audio-classification-on-speech-1SimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 26.16±0.34
few-shot-audio-classification-on-voxceleb1SimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 31.18±0.37
few-shot-audio-classification-on-voxceleb1MT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 33.58±0.39
few-shot-audio-classification-on-voxceleb1Multi-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 21.68±0.40
few-shot-audio-classification-on-watkinsMulti-Label Augmentation Prediction (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 28.88±0.39
few-shot-audio-classification-on-watkinsMT-SLVR (SimCLR + MLAP) w/ Parallel Adapters (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 59.49±0.42
few-shot-audio-classification-on-watkinsSimCLR (FSD50K, RN18)
Top-1 Accuracy(5-Way-1-Shot): 52.91±0.41

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
MT-SLVR:多任务自监督学习以生成变换不变表示 | 论文 | HyperAI超神经