HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI

Yirong Chen; Weiquan Fan; Xiaofen Xing; Jianxin Pang; Minlie Huang; Wenjing Han; Qianfeng Tie; Xiangmin Xu

CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI

Abstract

Human language expression is based on the subjective construal of the situation instead of the objective truth conditions, which means that speakers' personalities and emotions after cognitive processing have an important influence on conversation. However, most existing datasets for conversational AI ignore human personalities and emotions, or only consider part of them. It's difficult for dialogue systems to understand speakers' personalities and emotions although large-scale pre-training language models have been widely used. In order to consider both personalities and emotions in the process of conversation generation, we propose CPED, a large-scale Chinese personalized and emotional dialogue dataset, which consists of multi-source knowledge related to empathy and personal characteristic. These knowledge covers gender, Big Five personality traits, 13 emotions, 19 dialogue acts and 10 scenes. CPED contains more than 12K dialogues of 392 speakers from 40 TV shows. We release the textual dataset with audio features and video features according to the copyright claims, privacy issues, terms of service of video platforms. We provide detailed description of the CPED construction process and introduce three tasks for conversational AI, including personality recognition, emotion recognition in conversations as well as personalized and emotional conversation generation. Finally, we provide baseline systems for these tasks and consider the function of speakers' personalities and emotions on conversation. Our motivation is to propose a dataset to be widely adopted by the NLP community as a new open benchmark for conversational AI research. The full dataset is available at https://github.com/scutcyr/CPED.

Code Repositories

scutcyr/CPED
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
emotion-recognition-in-conversation-on-cpedBERT+AVG+MLP
Accuracy of Sentiment: 51.50
Macro-F1 of Sentiment: 48.02
personality-recognition-in-conversation-on-1BERT$_{ssenet}^{c}$
Accuracy (%): 67.25
Accuracy of Agreeableness: 85.89
Accuracy of Conscientiousness: 63.48
Accuracy of Extraversion: 78.21
Accuracy of Neurotism: 53.27
Accuracy of Openness: 55.42
Macro-F1: 74.08
personality-recognition-in-conversation-on-1BERT$^{s}$
Accuracy (%): 67.23
Accuracy of Agreeableness: 85.76
Accuracy of Conscientiousness: 63.60
Accuracy of Extraversion: 78.08
Accuracy of Neurotism: 50.75
Accuracy of Openness: 57.93
Macro-F1: 72.93
personality-recognition-in-conversation-on-1BERT$_{senet}^{c}$
Accuracy (%): 66.02
Accuracy of Agreeableness: 81.99
Accuracy of Conscientiousness: 61.59
Accuracy of Extraversion: 77.71
Accuracy of Neurotism: 53.4
Accuracy of Openness: 55.42
Macro-F1: 71.89
personality-recognition-in-conversation-on-1BERT$^{c}$
Accuracy (%): 66.32
Accuracy of Agreeableness: 80.98
Accuracy of Conscientiousness: 63.35
Accuracy of Extraversion: 78.08
Accuracy of Neurotism: 55.29
Accuracy of Openness: 53.90
Macro-F1: 72.69
personalized-and-emotional-conversation-on{emo+da}-GPT w/o emo
Average Embedding: 0.5564
BLEU: 0.1252
Distinct-1: 0.0451
Distinct-2: 0.2746
Greedy Embedding: 0.4964
PPL: 22.84
bertscore: 0.5666
personalized-and-emotional-conversation-onGPT-{per+emo}
Average Embedding: 0.5617
BLEU: 0.1403
Distinct-1: 0.0602
Distinct-2: 0.3388
Greedy Embedding: 0.5026
PPL: 17.70
bertscore: 0.5719
personalized-and-emotional-conversation-on{emo+da}-GPT
Average Embedding: 0.5552
BLEU: 0.1304
Distinct-1: 0.0476
Distinct-2: 0.2785
Greedy Embedding: 0.4962
PPL: 21.60
bertscore: 0.5674
personalized-and-emotional-conversation-onGPT-{per}
Average Embedding: 0.5606
BLEU: 0.1372
Distinct-1: 0.0592
Distinct-2: 0.3363
Greedy Embedding: 0.5009
PPL: 18.08
bertscore: 0.5715
personalized-and-emotional-conversation-onGPT-{da}
Average Embedding: 0.5610
BLEU: 0.1372
Distinct-1: 0.0605
Distinct-2: 0.3389
Greedy Embedding: 0.5017
PPL: 17.72
bertscore: 0.5703
personalized-and-emotional-conversation-onGPT
Average Embedding: 0.5509
BLEU: 0.1171
Distinct-1: 0.0482
Distinct-2: 0.2738
Greedy Embedding: 0.4922
PPL: 20.07
bertscore: 0.5629
personalized-and-emotional-conversation-on{emo+da}-GPT w/o da
Average Embedding: 0.5556
BLEU: 0.1272
Distinct-1: 0.0473
Distinct-2: 0.2790
Greedy Embedding: 0.4962
PPL: 22.09
bertscore: 0.5669
personalized-and-emotional-conversation-onGPT-{per+emo+da}
Average Embedding: 0.5608
BLEU: 0.1382
Distinct-1: 0.0601
Distinct-2: 0.3404
Greedy Embedding: 05012
PPL: 17.80
bertscore: 0.5722
personalized-and-emotional-conversation-onGPT-{emo}
Average Embedding: 0.5588
BLEU: 0.1342
Distinct-1: 0.0614
Distinct-2: 0.3430
Greedy Embedding: 0.4996
PPL: 17.48
bertscore: 0.5709

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | Papers | HyperAI