Command Palette
Search for a command to run...
{Anthropic}

Abstract
We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| arithmetic-reasoning-on-gsm8k | Claude 3 Sonnet (0-shot chain-of-thought) | Accuracy: 92.3 |
| arithmetic-reasoning-on-gsm8k | Claude 3 Haiku (0-shot chain-of-thought) | Accuracy: 88.9 |
| arithmetic-reasoning-on-gsm8k | Claude 3 Opus (0-shot chain-of-thought) | Accuracy: 95 |
| code-generation-on-mbpp | Claude 3 Haiku | Accuracy: 80.4 |
| code-generation-on-mbpp | Claude 3 Sonnet | Accuracy: 79.4 |
| code-generation-on-mbpp | Claude 3 Opus | Accuracy: 86.4 |
| common-sense-reasoning-on-winogrande | Claude 3 Opus (5-shot) | Accuracy: 88.5 |
| common-sense-reasoning-on-winogrande | Claude 3 Sonnet (5-shot) | Accuracy: 75.1 |
| common-sense-reasoning-on-winogrande | Claude 3 Haiku (5-shot) | Accuracy: 74.2 |
| long-context-understanding-on-mmneedle | Claude 3 Opus | 1 Image, 2*2 Stitching, Exact Accuracy: 52.25 1 Image, 4*4 Stitching, Exact Accuracy: 12.3 1 Image, 8*8 Stitching, Exact Accuracy: 1.6 10 Images, 1*1 Stitching, Exact Accuracy: 66.93 10 Images, 2*2 Stitching, Exact Accuracy: 4.6 10 Images, 4*4 Stitching, Exact Accuracy: 0.4 10 Images, 8*8 Stitching, Exact Accuracy: 0 |
| multi-task-language-understanding-on-mmlu | Claude 3 Haiku (5-shot) | Average (%): 75.2 |
| multi-task-language-understanding-on-mmlu | Claude 3 Sonnet (5-shot) | Average (%): 79 |
| question-answering-on-pubmedqa | Claude 3 Opus (5-shot) | Accuracy: 75.8 |
| question-answering-on-pubmedqa | Claude 3 Opus (zero-shot) | Accuracy: 74.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.