HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Mixer is more than just a model

Ji Qingfeng ; Wang Yuxin ; Sun Letong

Mixer is more than just a model

Abstract

Recently, MLP structures have regained popularity, with MLP-Mixer standingout as a prominent example. In the field of computer vision, MLP-Mixer is notedfor its ability to extract data information from both channel and tokenperspectives, effectively acting as a fusion of channel and token information.Indeed, Mixer represents a paradigm for information extraction that amalgamateschannel and token information. The essence of Mixer lies in its ability toblend information from diverse perspectives, epitomizing the true concept of"mixing" in the realm of neural network architectures. Beyond channel and tokenconsiderations, it is possible to create more tailored mixers from variousperspectives to better suit specific task requirements. This study focuses onthe domain of audio recognition, introducing a novel model named AudioSpectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporatesinsights from both time and frequency domains. Experimental results demonstratethat ASM-RH is particularly well-suited for audio data and yields promisingoutcomes across multiple classification tasks. The models and optimal weightsfiles will be published.

Benchmarks

BenchmarkMethodologyMetrics
audio-classification-on-ravdessASM-RH-A
Top-1 Accuracy: 75.4
audio-classification-on-speech-commands-1ASM-RH
Accuracy: 96.51

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Mixer is more than just a model | Papers | HyperAI