Command Palette
Search for a command to run...
{Anthropic}
Abstract
This report includes the model card [1] for Claude models, focusing on Claude 2, along with the results of a range of safety, alignment, and capabilities evaluations. We have been iterating on the training and evaluation of Claude-type models since our first work on Reinforcement Learning from Human Feedback (RLHF) [2]; the newest Claude 2 model represents a continuous evolution from those early and less capable ‘helpful and harmless’ language assistants.This report is not intended to be a scientific paper since most aspects of training and evaluating these models have been documented in our research papers. These include papers on preference modeling [3], reinforcement learning from human feedback for helpful and harmless models [2], red teaming language models [4], measuring representation of subjective global values in language models [5], honesty, (i.e., exploring language models’ ability to recognize what they know) [6], evaluating language models with language model-generated tests [7], moral self-correction [8], and Constitutional AI [9]. We also discussed Claude’s specific constitution in a recent blog post [10]. Our work using human evaluations to test model safety is most thoroughly documented in our paper “Red-Teaming Language Models to Reduce Harms” [4], while our recent work on automated safety evaluation is “Discovering Language Model Behaviors with Model-Written Evaluations” [7]. This report is also not comprehensive – we expect to release new findings as we continue our research and evaluations of frontier models. However, we hope it provides useful insight into Claude 2’s capabilities and limitations.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| arithmetic-reasoning-on-gsm8k | Claude 1.3 (0-shot chain-of-thought) | Accuracy: 85.2 |
| arithmetic-reasoning-on-gsm8k | Claude 2 (0-shot chain-of-thought) | Accuracy: 88 |
| arithmetic-reasoning-on-gsm8k | Claude Instant 1.1 (0-shot chain-of-thought) | Accuracy: 80.9 |
| common-sense-reasoning-on-arc-challenge | Claude 2 (few-shot, k=5) | Accuracy: 91 |
| common-sense-reasoning-on-arc-challenge | Claude Instant 1.1 (few-shot, k=5) | Accuracy: 85.7 |
| common-sense-reasoning-on-arc-challenge | Claude 1.3 (few-shot, k=5) | Accuracy: 90 |
| multi-task-language-understanding-on-mmlu | Claude Instant 1.1 (5-shot) | Average (%): 73.4 |
| question-answering-on-quality | Claude Instant 1.1 (5-shot) | Accuracy: 80.5 |
| question-answering-on-quality | Claude 1.3 (5-shot) | Accuracy: 84.1 |
| question-answering-on-quality | Claude 2 (5-shot) | Accuracy: 83.2 |
| question-answering-on-triviaqa | Claude 2 (few-shot, k=5) | EM: 87.5 |
| question-answering-on-triviaqa | Claude Instant 1.1 (few-shot, k=5) | EM: 78.9 |
| question-answering-on-triviaqa | Claude 1.3 (few-shot, k=5) | EM: 86.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.