HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SubSpectral Normalization for Neural Audio Data Processing

Simyung Chang Hyoungwoo Park Janghoon Cho Hyunsin Park Sungrack Yun Kyuwoong Hwang

SubSpectral Normalization for Neural Audio Data Processing

Abstract

Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.

Benchmarks

BenchmarkMethodologyMetrics
keyword-spotting-on-google-speech-commandsres8 w/ SSN(S=4, A=Sub)
% Test Accuracy: 95.4% ±0.22
keyword-spotting-on-google-speech-commandsres15 w/ SSN(S=4, A=Sub) (2019)
% Test Accuracy: 97.5% ±0.15
keyword-spotting-on-google-speech-commandsres15 w/ SSN(S=4, A=Sub)
% Test Accuracy: 96.8% ±0.13
keyword-spotting-on-tau-urban-acoustic-scenesCP-ResNet(ch64) w/ SSN(S=2, A=Sub)
Accuracy: 83.6% ±0.07
keyword-spotting-on-tau-urban-acoustic-scenesCP-ResNet(ch128) w/ SSN(S=2, A=Sub)
Accuracy: 84.1% ±0.20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SubSpectral Normalization for Neural Audio Data Processing | Papers | HyperAI