5 months ago

Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition

Li Jiang ; Wang Xiaoping ; Zeng Zhigang

Abstract

Multimodal emotion recognition in conversation (MERC) has garneredsubstantial research attention recently. Existing MERC methods face severalchallenges: (1) they fail to fully harness direct inter-modal cues, possiblyleading to less-than-thorough cross-modal modeling; (2) they concurrentlyextract information from the same and different modalities at each networklayer, potentially triggering conflicts from the fusion of multi-source data;(3) they lack the agility required to detect dynamic sentimental changes,perhaps resulting in inaccurate classification of utterances with abruptsentiment shifts. To address these issues, a novel approach named GraphSmile isproposed for tracking intricate emotional cues in multimodal dialogues.GraphSmile comprises two key components, i.e., GSF and SDP modules. GSFingeniously leverages graph structures to alternately assimilate inter-modaland intra-modal emotional dependencies layer by layer, adequately capturingcross-modal cues while effectively circumventing fusion conflicts. SDP is anauxiliary task to explicitly delineate the sentiment dynamics betweenutterances, promoting the model's ability to distinguish sentimentaldiscrepancies. Furthermore, GraphSmile is effortlessly applied to multimodalsentiment analysis in conversation (MSAC), forging a unified multimodalaffective model capable of executing MERC and MSAC tasks. Empirical results onmultiple benchmarks demonstrate that GraphSmile can handle complex emotionaland sentimental patterns, significantly outperforming baseline models.

Code Repositories

lijfrank-open/GraphSmile

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
emotion-recognition-in-conversation-on	GraphSmile	Accuracy: 72.77 Weighted-F1: 72.81
emotion-recognition-in-conversation-on-7	GraphSmile	Accuracy: 86.53 Weighted F1: 86.52
emotion-recognition-in-conversation-on-cmu-2	GraphSmile	Accuracy: 46.82 Weighted F1: 44.93
emotion-recognition-in-conversation-on-cmu-3	GraphSmile	Accuracy: 67.73 Weighted F1: 66.73
emotion-recognition-in-conversation-on-meld	GraphSmile	Accuracy: 67.70 Weighted-F1: 66.71
emotion-recognition-in-conversation-on-meld-1	GraphSmile	Accuracy: 74.44 Weighted F1: 74.31
multimodal-emotion-recognition-on-cmu-mosei-1	GraphSmile	Accuracy: 46.82 Weighted F1: 44.93
multimodal-emotion-recognition-on-cmu-mosei-2	GraphSmile	Accuracy: 67.73 Weighted F1: 66.73
multimodal-emotion-recognition-on-iemocap	GraphSmile	Accuracy: 72.77 Weighted F1: 72.81
multimodal-emotion-recognition-on-iemocap-4	GraphSmile	Accuracy: 86.53 Weighted F1: 86.52
multimodal-emotion-recognition-on-meld	GraphSmile	Accuracy: 67.70 Weighted F1: 66.71
multimodal-emotion-recognition-on-meld-1	GraphSmile	Accuracy: 74.44 Weighted F1: 74.31

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette