HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

BloombergGPT: A Large Language Model for Finance

Shijie Wu; Ozan Irsoy; Steven Lu; Vadim Dabravolski; Mark Dredze; Sebastian Gehrmann; Prabhanjan Kambadur; David Rosenberg; Gideon Mann

BloombergGPT: A Large Language Model for Finance

Abstract

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

Code Repositories

yangletliu/finlora
pytorch
Mentioned in GitHub
open-finance-lab/finlora
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
common-sense-reasoning-on-arc-challengeBLOOM 176B (1-shot)
Accuracy: 50.85
common-sense-reasoning-on-arc-challengeBloomberg GPT 50B (1-shot)
Accuracy: 48.63
common-sense-reasoning-on-arc-challengeGPT-NeoX 20B (1-shot)
Accuracy: 45.39
common-sense-reasoning-on-arc-challengeOPT 66B (one-shot)
Accuracy: 44.54
common-sense-reasoning-on-arc-easyGPT-NeoX 20B (1-shot)
Accuracy: 70.79
common-sense-reasoning-on-arc-easyBloomberg GPT 50B (1-shot)
Accuracy: 73.99
common-sense-reasoning-on-arc-easyOPT 66B (1-shot)
Accuracy: 71.25
common-sense-reasoning-on-arc-easyBLOOM 176B (1-shot)
Accuracy: 75.93
common-sense-reasoning-on-big-benchBLOOM 176B (few-shot, k=3)
Accuracy: 40.4
common-sense-reasoning-on-big-benchGPT-NeoX 20B (few-shot, k=3)
Accuracy: 40.8
common-sense-reasoning-on-big-benchBloomberg GPT 50B (few-shot, k=3)
Accuracy: 34
common-sense-reasoning-on-big-benchPaLM 540B (few-shot, k=3)
Accuracy: 60.8
common-sense-reasoning-on-big-benchOPT 66B (few-shot, k=3)
Accuracy: 40.4
common-sense-reasoning-on-big-bench-causalGPT-NeoX 20B (few-shot, k=3)
Accuracy: 52.41
common-sense-reasoning-on-big-bench-causalBloombergGPT 50B (few-shot, k=3)
Accuracy: 49.73
common-sense-reasoning-on-big-bench-causalOPT 66B (few-shot, k=3)
Accuracy: 51.87
common-sense-reasoning-on-big-bench-causalPaLM 540B (few-shot, k=3)
Accuracy: 61.0
common-sense-reasoning-on-big-bench-causalBLOOM 176B (few-shot, k=3)
Accuracy: 51.87
common-sense-reasoning-on-big-bench-dateGPT-NeoX 20B (few-shot, k=3)
Accuracy: 45.60
common-sense-reasoning-on-big-bench-datePaLM 540B (few-shot,k=3)
Accuracy: 53.6
common-sense-reasoning-on-big-bench-dateOPT 66B (few-shot, k=3)
Accuracy: 49.60
common-sense-reasoning-on-big-bench-dateBloomberg GPT 50B (few-shot, k=3)
Accuracy: 54.8
common-sense-reasoning-on-big-bench-dateBLOOM 176B (few-shot, k=3)
Accuracy: 50.00
common-sense-reasoning-on-big-bench-sportsOPT 66B (few-shot, k=3)
Accuracy: 54.4
common-sense-reasoning-on-big-bench-sportsGPT-NeoX (few-shot, k=3)
Accuracy: 53.2
common-sense-reasoning-on-big-bench-sportsBloomberg GPT (few-shot, k=3)
Accuracy: 62.8
common-sense-reasoning-on-big-bench-sportsPaLM 540B (few-shot, k=3)
Accuracy: 80.4
common-sense-reasoning-on-commonsenseqaOPT 66B (1-shot)
Accuracy: 66.4
common-sense-reasoning-on-commonsenseqaBLOOM 176B (1-shot)
Accuracy: 64.2
common-sense-reasoning-on-commonsenseqaGPT-NeoX 20B (1-shot)
Accuracy: 60.4
common-sense-reasoning-on-commonsenseqaBloomberg GPT 50B (1-shot)
Accuracy: 65.5
common-sense-reasoning-on-recordOPT 66B (1-shot)
F1: 82.5
common-sense-reasoning-on-recordBloomberg GPT 50B (1-shot)
F1: 82.8
common-sense-reasoning-on-recordGPT-NeoX 20B (1-shot)
F1: 67.9
common-sense-reasoning-on-recordBLOOM 176B (1-shot)
F1: 78
common-sense-reasoning-on-winograndeOPT 66B (1-shot)
Accuracy: 66.1
common-sense-reasoning-on-winograndeBloomberg GPT (one-shot)
Accuracy: 64.1
common-sense-reasoning-on-winograndeBLOOM 176B (1-shot)
Accuracy: 67
common-sense-reasoning-on-winograndeGPT-NeoX (one-shot)
Accuracy: 60.6
logical-reasoning-on-big-bench-formalPaLM 540B (few-shot, k=3)
Accuracy: 53.6
logical-reasoning-on-big-bench-formalGPT-NeoX 20B (few-shot, k=3)
Accuracy: 52.8
logical-reasoning-on-big-bench-formalOPT 66B (few-shot, k=3)
Accuracy: 54
logical-reasoning-on-big-bench-formalBLOOM 176B (few-shot, k=3)
Accuracy: 52.8
logical-reasoning-on-big-bench-formalBloomberg GPT 50B (few-shot, k=3)
Accuracy: 50.8
logical-reasoning-on-big-bench-penguins-in-aGPT-NeoX (few-shot, k=3)
Accuracy: 33.56
logical-reasoning-on-big-bench-penguins-in-aOPT 66B (few-shot, k=3)
Accuracy: 28.08
logical-reasoning-on-big-bench-penguins-in-aBLOOM 176B (few-shot, k=3)
Accuracy: 40.41
logical-reasoning-on-big-bench-penguins-in-aBloomberg GPT (few-shot, k=3)
Accuracy: 37.67
logical-reasoning-on-big-bench-penguins-in-aPaLM 540B (few-shot, k=3)
Accuracy: 44.5
logical-reasoning-on-big-bench-reasoningPaLM 540B (few-shot, k=3)
Accuracy: 38
logical-reasoning-on-big-bench-reasoningBLOOM 176B (few-shot, k=3)
Accuracy: 36.8
logical-reasoning-on-big-bench-reasoningGPT-NeoX (few-shot, k=3)
Accuracy: 26
logical-reasoning-on-big-bench-reasoningOPT 66B (few-shot, k=3)
Accuracy: 31.2
logical-reasoning-on-big-bench-reasoningBloomberg GPT (few-shot, k=3)
Accuracy: 34.8
logical-reasoning-on-big-bench-temporalBloomberg GPT (few-shot, k=3)
Accuracy: 29.2
logical-reasoning-on-big-bench-temporalOPT 66B (few-shot, k=3)
Accuracy: 23.6
logical-reasoning-on-big-bench-temporalPaLM 540B (few-shot, k=3)
Accuracy: 39.6
logical-reasoning-on-big-bench-temporalBLOOM 176B (few-shot, k=3)
Accuracy: 36.8
logical-reasoning-on-big-bench-temporalGPT-NeoX (few-shot, k=3)
Accuracy: 21.2
multi-task-language-understanding-on-mmluBloomberg GPT 50B (5-shot)
Average (%): 39.2
multi-task-language-understanding-on-mmluBLOOM 176B (5-shot)
Average (%): 39.1
multi-task-language-understanding-on-mmluOPT 66B (5-shot)
Average (%): 36
multiple-choice-question-answering-mcqa-on-27BLOOM 176B (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27OPT 66B (few-shot, k=3)
Accuracy: 91.6
multiple-choice-question-answering-mcqa-on-27Bloomberg GPT (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27GPT-NeoX (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27PaLM 540B (few-shot, k=3)
Accuracy: 70.8
multiple-choice-question-answering-mcqa-on-28GPT-NeoX (few-shot, k=3)
Accuracy: 86.4
multiple-choice-question-answering-mcqa-on-28OPT 66B (few-shot, k=3)
Accuracy: 91.2
multiple-choice-question-answering-mcqa-on-28BLOOM 176B (few-shot, k=3)
Accuracy: 91.2
multiple-choice-question-answering-mcqa-on-28Bloomberg GPT (few-shot, k=3)
Accuracy: 90.4
multiple-choice-question-answering-mcqa-on-28PaLM 540B (few-shot, k=3)
Accuracy: 87.2
multiple-choice-question-answering-mcqa-on-29Bloomberg GPT (few-shot, k=3)
Accuracy: 42
multiple-choice-question-answering-mcqa-on-29BLOOM 176B (few-shot, k=3)
Accuracy: 50
multiple-choice-question-answering-mcqa-on-29PaLM 540B (few-shot, k=3)
Accuracy: 62.4
multiple-choice-question-answering-mcqa-on-29OPT 66B (few-shot, k=3)
Accuracy: 42
multiple-choice-question-answering-mcqa-on-29GPT-NeoX (few-shot, k=3)
Accuracy: 45.2
multiple-choice-question-answering-mcqa-on-30BLOOM 176B (few-shot, k=3)
Accuracy: 54.8
multiple-choice-question-answering-mcqa-on-30Bloomberg GPT (few-shot, k=3)
Accuracy: 56
multiple-choice-question-answering-mcqa-on-30GPT-NeoX (few-shot, k=3)
Accuracy: 54
multiple-choice-question-answering-mcqa-on-30PaLM 540B (few-shot, k=3)
Accuracy: 76
multiple-choice-question-answering-mcqa-on-30OPT 66B (few-shot, k=3)
Accuracy: 52.8
natural-language-inference-on-anli-testBLOOM 176B (one-shot)
A1: 33.6
A2: 33.8
A3: 35.17
natural-language-inference-on-anli-testOPT 66B (one-shot)
A1: 33.1
A2: 34.2
A3: 34.92
natural-language-inference-on-anli-testGPT-NeoX (one-shot)
A1: 32.6
A2: 33.8
A3: 36.17
natural-language-inference-on-anli-testBloomberg GPT (one-shot)
A1: 32.9
A2: 34.4
A3: 37.33
natural-language-inference-on-commitmentbankOPT 66B (one-shot)
Accuracy: 44.64
natural-language-inference-on-commitmentbankGPT-NeoX (one-shot)
Accuracy: 48.21
natural-language-inference-on-commitmentbankBLOOM 176B (one-shot)
Accuracy: 48.21
natural-language-inference-on-commitmentbankBloomberg GPT (one-shot)
Accuracy: 53.57
natural-language-inference-on-rteGPT-NeoX 20B (1-shot)
Accuracy: 53.8%
natural-language-inference-on-rteBloomberg GPT 50B (1-shot)
Accuracy: 69.3%
natural-language-inference-on-rteOPT 66B (1-shot)
Accuracy: 54.9%
natural-language-inference-on-rteBLOOM 176B (1-shot)
Accuracy: 57.4%
question-answering-on-boolqBloomberg GPT 50B (1-shot)
Accuracy: 74.6
question-answering-on-boolqGPT-NeoX 20B (1-shot)
Accuracy: 46.4
question-answering-on-boolqOPT 66B (1-shot)
Accuracy: 57.5
question-answering-on-boolqBLOOM 176B (1-shot)
Accuracy: 52.9
question-answering-on-copaBLOOM 176B (one-shot)
Accuracy: 84
question-answering-on-copaOPT 66B (one-shot)
Accuracy: 86
question-answering-on-copaGPT-NeoX (one-shot)
Accuracy: 88
question-answering-on-copaBloomberg GPT (one-shot)
Accuracy: 86
question-answering-on-multircBLOOM 176B (1-shot)
F1: 26.7
question-answering-on-multircGPT-NeoX 20B (1-shot)
F1: 22.9
question-answering-on-multircOPT 66B (1-shot)
F1: 18.8
question-answering-on-multircBloomberg GPT 50B (1-shot)
F1: 62.3
question-answering-on-openbookqaBLOOM 176B (2-shot)
Accuracy: 47.2
question-answering-on-openbookqaBloomberg GPT 50B (1-shot)
Accuracy: 51.6
question-answering-on-openbookqaGPT-NeoX 50B (2-shot)
Accuracy: 44.2
question-answering-on-openbookqaOPT 66B (one-shot)
Accuracy: 58.0
question-answering-on-piqaOPT 66B (1-shot)
Accuracy: 77.6
question-answering-on-piqaGPT-NeoX 20B (1-shot)
Accuracy: 75.8
question-answering-on-piqaBloomberg GPT 50B (1-shot)
Accuracy: 77.9
question-answering-on-piqaBLOOM 176B (1-shot)
Accuracy: 77
reading-comprehension-on-raceBLOOM 176B (one-shot)
Accuracy (High): 39.14
Accuracy (Middle): 52.3
reading-comprehension-on-raceGPT-NeoX (one-shot)
Accuracy (High): 34.33
Accuracy (Middle): 41.23
reading-comprehension-on-raceOPT 66B (one-shot)
Accuracy (High): 37.02
Accuracy (Middle): 47.42
reading-comprehension-on-raceBloomberg GPT (one-shot)
Accuracy (High): 41.74
Accuracy (Middle): 54.32
sarcasm-detection-on-big-bench-snarksPaLM 540B (few-shot, k=3)
Accuracy: 78.1
sarcasm-detection-on-big-bench-snarksBloomberg GPT (few-shot, k=3)
Accuracy: 69.66
sarcasm-detection-on-big-bench-snarksBLOOM 176B (few-shot, k=3)
Accuracy: 72.47
sarcasm-detection-on-big-bench-snarksGPT-NeoX (few-shot, k=3)
Accuracy: 62.36

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BloombergGPT: A Large Language Model for Finance | Papers | HyperAI