Command Palette
Search for a command to run...
Li Xian ; Li Xiaofei

Abstract
Self-supervised learning (SSL) learns knowledge from a large amount ofunlabeled data, and then transfers the knowledge to a specific problem with alimited number of labeled data. SSL has achieved promising results in variousdomains. This work addresses the problem of segment-level general audio SSL,and proposes a new transformer-based teacher-student SSL model, named ATST. Atransformer encoder is developed on a recently emerged teacher-student baselinescheme, which largely improves the modeling capability of pre-training. Inaddition, a new strategy for positive pair creation is designed to fullyleverage the capability of transformer. Extensive experiments have beenconducted, and the proposed model achieves the new state-of-the-art results onalmost all of the downstream tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-classification-on-balanced-audio-set | Base (ours) | Mean AP: 37.4 |
| speaker-identification-on-voxceleb1 | ATST Base (ours) | Accuracy: 94.3 Top-1 (%): 94.3 |
| spoken-command-recognition-on-speech-command | Base (ours) | Accuracy: 98.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.