HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

ATST: Audio Representation Learning with Teacher-Student Transformer

Li Xian ; Li Xiaofei

ATST: Audio Representation Learning with Teacher-Student Transformer

Abstract

Self-supervised learning (SSL) learns knowledge from a large amount ofunlabeled data, and then transfers the knowledge to a specific problem with alimited number of labeled data. SSL has achieved promising results in variousdomains. This work addresses the problem of segment-level general audio SSL,and proposes a new transformer-based teacher-student SSL model, named ATST. Atransformer encoder is developed on a recently emerged teacher-student baselinescheme, which largely improves the modeling capability of pre-training. Inaddition, a new strategy for positive pair creation is designed to fullyleverage the capability of transformer. Extensive experiments have beenconducted, and the proposed model achieves the new state-of-the-art results onalmost all of the downstream tasks.

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-balanced-audio-setBase (ours)
Mean AP: 37.4
speaker-identification-on-voxceleb1ATST Base (ours)
Accuracy: 94.3
Top-1 (%): 94.3
spoken-command-recognition-on-speech-commandBase (ours)
Accuracy: 98.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ATST: Audio Representation Learning with Teacher-Student Transformer | Papers | HyperAI