4 months ago

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Christoph Schuhmann Robert Kaczmarczyk Gollam Rabby Felix Friedrich Maurice Kraus Kourosh Nadi Huu Nguyen Kristian Kersting S\u00f6ren Auer

Abstract

The advancement of text-to-speech and audio generation models necessitatesrobust benchmarks for evaluating the emotional understanding capabilities of AIsystems. Current speech emotion recognition (SER) datasets often exhibitlimitations in emotional granularity, privacy concerns, or reliance on actedportrayals. This paper introduces EmoNet-Voice, a new resource for speechemotion detection, which includes EmoNet-Voice Big, a large-scale pre-trainingdataset (featuring over 4,500 hours of speech across 11 voices, 40 emotions,and 4 languages), and EmoNet-Voice Bench, a novel benchmark dataset with humanexpert annotations. EmoNet-Voice is designed to evaluate SER models on afine-grained spectrum of 40 emotion categories with different levels ofintensities. Leveraging state-of-the-art voice generation, we curated syntheticaudio snippets simulating actors portraying scenes designed to evoke specificemotions. Crucially, we conducted rigorous validation by psychology experts whoassigned perceived intensity labels. This synthetic, privacy-preservingapproach allows for the inclusion of sensitive emotional states often absent inexisting datasets. Lastly, we introduce Empathic Insight Voice models that seta new standard in speech emotion recognition with high agreement with humanexperts. Our evaluations across the current model landscape exhibit valuablefindings, such as high-arousal emotions like anger being much easier to detectthan low-arousal states like concentration.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Christoph Schuhmann Robert Kaczmarczyk Gollam Rabby Felix Friedrich Maurice Kraus Kourosh Nadi Huu Nguyen Kristian Kersting S\u00f6ren Auer

Abstract

Build AI with AI

Hyper Newsletters