HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Chen Wenxi ; Liang Yuzhe ; Ma Ziyang ; Zheng Zhisheng ; Chen Xie

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Abstract

Audio self-supervised learning (SSL) pre-training, which aims to learn goodrepresentations from unlabeled audio, has made remarkable progress. However,the extensive computational demands during pre-training pose a significantbarrier to the potential application and optimization of audio SSL models. Inthis paper, inspired by the success of data2vec 2.0 in image modality andAudio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) tofurther improve the effectiveness and efficiency in audio SSL. The proposed EATadopts the bootstrap self-supervised training paradigm to the audio domain. Anovel Utterance-Frame Objective (UFO) is designed to enhance the modelingcapability of acoustic events. Furthermore, we reveal that the masking strategyis critical in audio SSL pre-training, and superior audio representations canbe obtained with large inverse block masks. Experiment results demonstrate thatEAT achieves state-of-the-art (SOTA) performance on a range of audio-relatedtasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with asignificant pre-training speedup up to ~15x compared to existing audio SSLmodels.

Code Repositories

cwx-worst-one/eat
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-audiosetEAT
Test mAP: 0.486
audio-classification-on-balanced-audio-setEAT
Mean AP: 40.3
audio-classification-on-esc-50EAT
Accuracy (5-fold): 96.0
PRE-TRAINING DATASET: AudioSet
Top-1 Accuracy: 96.0
audio-classification-on-speech-commands-1EAT
Accuracy: 98.3±0.04

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer | Papers | HyperAI