Speaker Identification On Voxceleb1
评估指标
Accuracy
Top-1 (%)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| MSM-MAE | 96.6 | 96.6 | Masked Modeling Duo: Towards a Universal Audio Pre-training Framework | |
| M2D/0.6 | 96.5 | 96.5 | Masked Modeling Duo: Towards a Universal Audio Pre-training Framework | |
| M2D/0.7 | 96.3 | 96.3 | Masked Modeling Duo: Towards a Universal Audio Pre-training Framework | |
| M2D ratio=0.6 | 94.8 | 94.8 | Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input | |
| AudioMAE (local) | 94.8 | 94.8 | Masked Autoencoders that Listen | |
| ATST Base (ours) | 94.3 | 94.3 | ATST: Audio Representation Learning with Teacher-Student Transformer | |
| AudioMAE (global) | 94.1 | 94.1 | Masked Autoencoders that Listen | |
| AutoSpeech (N=8,C=128) | 87.66 | 87.66 | AutoSpeech: Neural Architecture Search for Speaker Recognition | |
| SSAST-FRAME | 80.8 | 80.8 | SSAST: Self-Supervised Audio Spectrogram Transformer | |
| SSAMBA | 70.1 | 70.1 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | |
| SSAST-PATCH | 64.2 | 64.2 | SSAST: Self-Supervised Audio Spectrogram Transformer | |
| COLA | 37.7 | 37.7 | Contrastive Learning of General-Purpose Audio Representations |
0 of 12 row(s) selected.