HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Shikib Mehri Maxine Eskenazi

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Abstract

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

Code Repositories

shikib/usr
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
dialogue-evaluation-on-usr-personachatUSR - MLM
Pearson Correlation: 0.0788
Spearman Correlation: 0.0795
dialogue-evaluation-on-usr-personachatUSR - DR (x = f)
Pearson Correlation: -0.0454
Spearman Correlation: -0.0495
dialogue-evaluation-on-usr-personachatUSR - DR (x = c)
Pearson Correlation: 0.6087
Spearman Correlation: 0.4814
dialogue-evaluation-on-usr-personachatUSR
Pearson Correlation: 0.4115
Spearman Correlation: 0.4693
dialogue-evaluation-on-usr-topicalchatUSR
Pearson Correlation: 0.4220
Spearman Correlation: 0.4192
dialogue-evaluation-on-usr-topicalchatUSR - DR (x = c)
Pearson Correlation: 0.4068
Spearman Correlation: 0.3245
dialogue-evaluation-on-usr-topicalchatUSR - DR (x = f)
Pearson Correlation: 0.3221
Spearman Correlation: 0.1419
dialogue-evaluation-on-usr-topicalchatUSR - MLM
Pearson Correlation: 0.3345
Spearman Correlation: 0.3086

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation | Papers | HyperAI