5 months ago

ATST: Audio Representation Learning with Teacher-Student Transformer

Li Xian ; Li Xiaofei

Abstract

Self-supervised learning (SSL) learns knowledge from a large amount ofunlabeled data, and then transfers the knowledge to a specific problem with alimited number of labeled data. SSL has achieved promising results in variousdomains. This work addresses the problem of segment-level general audio SSL,and proposes a new transformer-based teacher-student SSL model, named ATST. Atransformer encoder is developed on a recently emerged teacher-student baselinescheme, which largely improves the modeling capability of pre-training. Inaddition, a new strategy for positive pair creation is designed to fullyleverage the capability of transformer. Extensive experiments have beenconducted, and the proposed model achieves the new state-of-the-art results onalmost all of the downstream tasks.

Code Repositories

2024-MindSpore-1/Code6/tree/main/ats

mindspore

Audio-WestlakeU/ATST-SED

pytorch

Mentioned in GitHub

Audio-WestlakeU/audiossl/tree/main/audiossl/methods/atst

Official

pytorch

2023-MindSpore-4/Code8/tree/main/ats

mindspore

Benchmarks

Benchmark	Methodology	Metrics
audio-classification-on-balanced-audio-set	Base (ours)	Mean AP: 37.4
speaker-identification-on-voxceleb1	ATST Base (ours)	Accuracy: 94.3 Top-1 (%): 94.3
spoken-command-recognition-on-speech-command	Base (ours)	Accuracy: 98.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette