HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

Ziyang Ma Zhisheng Zheng Changli Tang Yujin Wang Xie Chen

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

Abstract

In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained. We generalize the targets extractor into Offline Targets Extractor (Off-TE) and Online Targets Extractor (On-TE). Based on this, we propose a new multi-tasking learning framework for self-supervised learning, MT4SSL, which stands for Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. MT4SSL uses the K-means algorithm as an Off-TE and a teacher network without gradients as an On-TE, respectively. Our model outperforms previous SSL methods by nontrivial margins on the LibriSpeech benchmark, and is comparable to or even better than the best-performing models with fewer data. Furthermore, we find that using both Off-TE and On-TE results in better convergence in the pre-training phase. With both effectiveness and efficiency, we think doing multi-task learning on self-supervised speech models from our perspective is a promising trend.

Code Repositories

ddlbojack/mt4ssl
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-librispeech-test-cleanMT4SSL
Word Error Rate (WER): 3.4
speech-recognition-on-librispeech-test-otherMT4SSL
Word Error Rate (WER): 9.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | Papers | HyperAI