8 months ago

Audio Classification

Audio Recognition

Computer Vision

Computer Vision

Ji Qingfeng ; Wang Yuxin ; Sun Letong

Abstract

Recently, MLP structures have regained popularity, with MLP-Mixer standingout as a prominent example. In the field of computer vision, MLP-Mixer is notedfor its ability to extract data information from both channel and tokenperspectives, effectively acting as a fusion of channel and token information.Indeed, Mixer represents a paradigm for information extraction that amalgamateschannel and token information. The essence of Mixer lies in its ability toblend information from diverse perspectives, epitomizing the true concept of"mixing" in the realm of neural network architectures. Beyond channel and tokenconsiderations, it is possible to create more tailored mixers from variousperspectives to better suit specific task requirements. This study focuses onthe domain of audio recognition, introducing a novel model named AudioSpectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporatesinsights from both time and frequency domains. Experimental results demonstratethat ASM-RH is particularly well-suited for audio data and yields promisingoutcomes across multiple classification tasks. The models and optimal weightsfiles will be published.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Audio Classification

Audio Recognition

Computer Vision

Computer Vision

Ji Qingfeng ; Wang Yuxin ; Sun Letong

Abstract

Recently, MLP structures have regained popularity, with MLP-Mixer standingout as a prominent example. In the field of computer vision, MLP-Mixer is notedfor its ability to extract data information from both channel and tokenperspectives, effectively acting as a fusion of channel and token information.Indeed, Mixer represents a paradigm for information extraction that amalgamateschannel and token information. The essence of Mixer lies in its ability toblend information from diverse perspectives, epitomizing the true concept of"mixing" in the realm of neural network architectures. Beyond channel and tokenconsiderations, it is possible to create more tailored mixers from variousperspectives to better suit specific task requirements. This study focuses onthe domain of audio recognition, introducing a novel model named AudioSpectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporatesinsights from both time and frequency domains. Experimental results demonstratethat ASM-RH is particularly well-suited for audio data and yields promisingoutcomes across multiple classification tasks. The models and optimal weightsfiles will be published.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Mixer is more than just a model | Papers | HyperAI