Audio Classification On Audioset

评估指标

Test mAP

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
OmniVec20.558OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning-
OmniVec0.548OmniVec: Learning robust representations with cross modal sharing-
EquiAV0.546EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
MAViL (Audio-Visual, single)0.533--
Audiovisual Masked Autoencoder (Audiovisual, Single)0.518Audiovisual Masked Autoencoders
CAV-MAE (Audio-Visual)0.512Contrastive Audio-Visual Masked Autoencoder
BEATs (Audio-only, Ensemble)0.506BEATs: Audio Pre-Training with Acoustic Tokenizers
UAVM (Audio + Video)0.504UAVM: Towards Unifying Audio and Visual Models
SSLAM (Audio-Only, Single)0.502SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes-
mn40_as (Ensemble)0.498Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
ATST-C2F(Single)0.497Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
MBT (AS-500K training + Video)0.496Attention Bottlenecks for Multimodal Fusion
PaSST (Ensemble)0.496Efficient Training of Audio Transformers with Patchout
DyMN-L (Audio-Only, Single)0.490Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
HTS-AT (Ensemble)0.487HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
EAT0.486EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
BEATs (Audio-only, Single)0.486BEATs: Audio Pre-Training with Acoustic Tokenizers
DTF-AT (Single)0.486DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification-
M2D-AS/0.70.485Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
AST (Ensemble)0.485AST: Audio Spectrogram Transformer
0 of 50 row(s) selected.
Audio Classification On Audioset | SOTA | HyperAI超神经