8 months ago

Audio and Speech Processing

Convolutional Neural Network

Method/Architecture

Moshe Mandel Or Tal Yossi Adi

Abstract

We present AERO, a audio super-resolution model that processes speech andmusic signals in the spectral domain. AERO is based on an encoder-decoderarchitecture with U-Net like skip connections. We optimize the model using bothtime and frequency domain loss functions. Specifically, we consider a set ofreconstruction losses together with perceptual ones in the form of adversarialand feature discriminator loss functions. To better handle phase informationthe proposed method operates over the complex-valued spectrogram using twoseparate channels. Unlike prior work which mainly considers low and highfrequency concatenation for audio super-resolution, the proposed methoddirectly predicts the full frequency range. We demonstrate high performanceacross a wide range of sample rates considering both speech and music. AEROoutperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL,and the subjective MUSHRA test. Audio samples and code are available athttps://pages.cs.huji.ac.il/adiyoss-lab/aero

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Audio and Speech Processing

Convolutional Neural Network

Method/Architecture

Moshe Mandel Or Tal Yossi Adi

Abstract

We present AERO, a audio super-resolution model that processes speech andmusic signals in the spectral domain. AERO is based on an encoder-decoderarchitecture with U-Net like skip connections. We optimize the model using bothtime and frequency domain loss functions. Specifically, we consider a set ofreconstruction losses together with perceptual ones in the form of adversarialand feature discriminator loss functions. To better handle phase informationthe proposed method operates over the complex-valued spectrogram using twoseparate channels. Unlike prior work which mainly considers low and highfrequency concatenation for audio super-resolution, the proposed methoddirectly predicts the full frequency range. We demonstrate high performanceacross a wide range of sample rates considering both speech and music. AEROoutperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL,and the subjective MUSHRA test. Audio samples and code are available athttps://pages.cs.huji.ac.il/adiyoss-lab/aero

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp