HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada Kunio Kashino

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

Abstract

Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches. In the M2D, the online network encodes visible patches and predicts masked patch representations, and the target network, a momentum encoder, encodes masked patches. To better predict target representations, the online network should model the input well, while the target network should also model it well to agree with online predictions. Then the learned representations should better model the input. We validated the M2D by learning general-purpose audio representations, and M2D set new state-of-the-art performance on tasks such as UrbanSound8K, VoxCeleb1, AudioSet20K, GTZAN, and SpeechCommandsV2. We additionally validate the effectiveness of M2D for images using ImageNet-1K in the appendix.

Code Repositories

nttcslab/m2d
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
keyword-spotting-on-google-speech-commandsM2D
Google Speech Commands V2 35: 98.5
speaker-identification-on-voxceleb1M2D ratio=0.6
Accuracy: 94.8
Top-1 (%): 94.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input | Papers | HyperAI