HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

Samuel J. Paech

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

Abstract

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

Code Repositories

eq-bench/eq-bench
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
emotional-intelligence-on-emotionalOpenAI gpt-3.5-0613
EQ-Bench Score: 49.17
emotional-intelligence-on-emotionallmsys/vicuna-33b-v1.3
EQ-Bench Score: 36.52
emotional-intelligence-on-emotionallmsys/vicuna-13b-v1.1
EQ-Bench Score: 32.85
emotional-intelligence-on-emotionalOpenAI text-davinci-002
EQ-Bench Score: 39.44
emotional-intelligence-on-emotionalOpenAI text-davinci-003
EQ-Bench Score: 43.73
emotional-intelligence-on-emotionalmeta-llama/Llama-2-70b-chat-hf
EQ-Bench Score: 51.56
emotional-intelligence-on-emotionalOpenAI ADA
EQ-Bench Score: 2.25
emotional-intelligence-on-emotionalmeta-llama/Llama-2-7b-chat-hf
EQ-Bench Score: 25.43
emotional-intelligence-on-emotionalOpenAI gpt-3.5-turbo-0301
EQ-Bench Score: 47.61
emotional-intelligence-on-emotionalIntel/neural-chat-7b-v3-1
EQ-Bench Score: 43.61
emotional-intelligence-on-emotionalQwen/Qwen-72B-Chat
EQ-Bench Score: 52.44
emotional-intelligence-on-emotionalopenchat/openchat 3.5
EQ-Bench Score: 37.08
emotional-intelligence-on-emotionalmigtissera/SynthIA-70B-v1.5
EQ-Bench Score: 54.83
emotional-intelligence-on-emotionalOpen-Orca/Mistral-7B-OpenOrca
EQ-Bench Score: 44.40
emotional-intelligence-on-emotionalOpenAI gpt-4-0613
EQ-Bench Score: 62.52
emotional-intelligence-on-emotionalOpenAI gpt-4-0314
EQ-Bench Score: 53.39
emotional-intelligence-on-emotionalQwen/Qwen-14B-Chat
EQ-Bench Score: 43.76
emotional-intelligence-on-emotionalKoala 13B
EQ-Bench Score: 24.92
emotional-intelligence-on-emotionalmeta-llama/Llama-2-13b-chat-hf
EQ-Bench Score: 33.02
emotional-intelligence-on-emotionalOpenAI ADA
EQ-Bench Score: 2.25
emotional-intelligence-on-emotionalAnthropic Claude2
EQ-Bench Score: 52.14
emotional-intelligence-on-emotional01-ai/Yi-34B-Chat
EQ-Bench Score: 51.03
emotional-intelligence-on-emotionallmsys/vicuna-7b-v1.1
EQ-Bench Score: 22.24
emotional-intelligence-on-emotionalOpenAI text-davinci-001
EQ-Bench Score: 15.19

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models | Papers | HyperAI