Audio Tagging On Audioset
评估指标
mean average precision
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| CAV-MAE (Audio-Visual) | 0.512 | Contrastive Audio-Visual Masked Autoencoder | |
| mn40_as (Ensemble) | 0.498 | Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation | |
| PaSST | 0.496 | Efficient Training of Audio Transformers with Patchout | |
| DyMN-L (Audio-Only, Single) | 0.490 | Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models | |
| Audio Spectrogram Transformer | 0.485 | AST: Audio Spectrogram Transformer | |
| mn40_as (Single) | 0.483 | Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation | |
| PSLA | 0.474 | PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation | |
| ST-SED | 0.467 | Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data | |
| CAV-MAE (Audio-Only) | 0.466 | Contrastive Audio-Visual Masked Autoencoder | |
| ERANN-1-6 | 0.450 | ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition | - |
| CNN14 | 0.431 | PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition |
0 of 11 row(s) selected.