HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

Li Xian ; Shao Nian ; Li Xiaofei

Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks

Abstract

Self-supervised learning (SSL) has emerged as a popular approach for learningaudio representations. One goal of audio self-supervised pre-training is totransfer knowledge to downstream audio tasks, generally including clip-leveland frame-level tasks. While frame-level tasks are important for fine-grainedacoustic scene/event understanding, prior studies primarily evaluate onclip-level downstream tasks. In order to tackle both clip-level and frame-leveltasks, this paper proposes Audio Teacher-Student Transformer (ATST), with aclip-level version (named ATST-Clip) and a frame-level version (namedATST-Frame), responsible for learning clip-level and frame-levelrepresentations, respectively. Both methods use a Transformer encoder and ateacher-student training scheme. We have carefully designed the view creationstrategy for ATST-Clip and ATST-Frame. Specifically, ATST-Clip usessegment-wise data augmentations, and ATST-Frame integrates frame-wise dataaugmentations and masking. Experimental results show that our ATST-Frame modelobtains state-of-the-art (SOTA) performances on most of the clip-level andframe-level downstream tasks. Especially, it outperforms other models by alarge margin on the frame-level sound event detection task. In addition, theperformance can be further improved by combining the two models throughknowledge distillation. Our code is available online.

Code Repositories

Audio-WestlakeU/ATST-SED
pytorch
Mentioned in GitHub
audio-westlakeu/audiossl
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-audiosetATST-Frame
Test mAP: 0.480
audio-classification-on-audiosetATST-C2F(Single)
Test mAP: 0.497

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks | Papers | HyperAI