HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation

Caroline Etienne; Guillaume Fidanza; Andrei Petrovskii; Laurence Devillers; Benoit Schmauch

CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation

Abstract

In this work we design a neural network for recognizing emotions in speech, using the IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. We examine the techniques of data augmentation with vocal track length perturbation, layer-wise optimizer adjustment, batch normalization of recurrent layers and obtain highly competitive results of 64.5% for weighted accuracy and 61.7% for unweighted accuracy on four emotions.

Benchmarks

BenchmarkMethodologyMetrics
speech-emotion-recognition-on-iemocapCNN+LSTM
UA: 0.650

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation | Papers | HyperAI