HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Speaker Normalization for Self-supervised Speech Emotion Recognition

Itai Gat Hagai Aronowitz Weizhong Zhu Edmilson Morais Ron Hoory

Speaker Normalization for Self-supervised Speech Emotion Recognition

Abstract

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

Benchmarks

BenchmarkMethodologyMetrics
speech-emotion-recognition-on-iemocapTAP
WA: 0.810
WA CV: 0.742

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Speaker Normalization for Self-supervised Speech Emotion Recognition | Papers | HyperAI