4 个月前

BloombergGPT:用于金融的大型语言模型

BloombergGPT:用于金融的大型语言模型

摘要

自然语言处理(NLP)在金融科技领域的应用广泛且复杂,涵盖了从情感分析、命名实体识别到问答系统等多个方面。大型语言模型(LLMs)已在多种任务中展现出有效性;然而,目前尚未有文献报道专门针对金融领域的大型语言模型。在本研究中,我们介绍了BloombergGPT,这是一个拥有500亿参数的语言模型,训练数据涵盖广泛的金融信息。我们基于彭博社丰富的数据资源构建了一个包含3630亿个标记的数据集,这可能是迄今为止最大的特定领域数据集,并辅以来自通用数据集的3450亿个标记。我们在标准的大型语言模型基准测试、公开的金融基准测试以及一系列内部基准测试上对BloombergGPT进行了验证,这些内部基准测试最能反映我们的预期用途。混合数据集的训练使得该模型在金融任务上的表现显著优于现有模型,同时在通用大型语言模型基准测试上的性能也未受影响。此外,我们详细解释了建模选择、训练过程及评估方法。我们发布了《训练编年史》(附录C),记录了我们在训练BloombergGPT过程中的经验。

代码仓库

yangletliu/finlora
pytorch
GitHub 中提及
open-finance-lab/finlora
pytorch
GitHub 中提及

基准测试

基准方法指标
common-sense-reasoning-on-arc-challengeBLOOM 176B (1-shot)
Accuracy: 50.85
common-sense-reasoning-on-arc-challengeBloomberg GPT 50B (1-shot)
Accuracy: 48.63
common-sense-reasoning-on-arc-challengeGPT-NeoX 20B (1-shot)
Accuracy: 45.39
common-sense-reasoning-on-arc-challengeOPT 66B (one-shot)
Accuracy: 44.54
common-sense-reasoning-on-arc-easyGPT-NeoX 20B (1-shot)
Accuracy: 70.79
common-sense-reasoning-on-arc-easyBloomberg GPT 50B (1-shot)
Accuracy: 73.99
common-sense-reasoning-on-arc-easyOPT 66B (1-shot)
Accuracy: 71.25
common-sense-reasoning-on-arc-easyBLOOM 176B (1-shot)
Accuracy: 75.93
common-sense-reasoning-on-big-benchBLOOM 176B (few-shot, k=3)
Accuracy: 40.4
common-sense-reasoning-on-big-benchGPT-NeoX 20B (few-shot, k=3)
Accuracy: 40.8
common-sense-reasoning-on-big-benchBloomberg GPT 50B (few-shot, k=3)
Accuracy: 34
common-sense-reasoning-on-big-benchPaLM 540B (few-shot, k=3)
Accuracy: 60.8
common-sense-reasoning-on-big-benchOPT 66B (few-shot, k=3)
Accuracy: 40.4
common-sense-reasoning-on-big-bench-causalGPT-NeoX 20B (few-shot, k=3)
Accuracy: 52.41
common-sense-reasoning-on-big-bench-causalBloombergGPT 50B (few-shot, k=3)
Accuracy: 49.73
common-sense-reasoning-on-big-bench-causalOPT 66B (few-shot, k=3)
Accuracy: 51.87
common-sense-reasoning-on-big-bench-causalPaLM 540B (few-shot, k=3)
Accuracy: 61.0
common-sense-reasoning-on-big-bench-causalBLOOM 176B (few-shot, k=3)
Accuracy: 51.87
common-sense-reasoning-on-big-bench-dateGPT-NeoX 20B (few-shot, k=3)
Accuracy: 45.60
common-sense-reasoning-on-big-bench-datePaLM 540B (few-shot,k=3)
Accuracy: 53.6
common-sense-reasoning-on-big-bench-dateOPT 66B (few-shot, k=3)
Accuracy: 49.60
common-sense-reasoning-on-big-bench-dateBloomberg GPT 50B (few-shot, k=3)
Accuracy: 54.8
common-sense-reasoning-on-big-bench-dateBLOOM 176B (few-shot, k=3)
Accuracy: 50.00
common-sense-reasoning-on-big-bench-sportsOPT 66B (few-shot, k=3)
Accuracy: 54.4
common-sense-reasoning-on-big-bench-sportsGPT-NeoX (few-shot, k=3)
Accuracy: 53.2
common-sense-reasoning-on-big-bench-sportsBloomberg GPT (few-shot, k=3)
Accuracy: 62.8
common-sense-reasoning-on-big-bench-sportsPaLM 540B (few-shot, k=3)
Accuracy: 80.4
common-sense-reasoning-on-commonsenseqaOPT 66B (1-shot)
Accuracy: 66.4
common-sense-reasoning-on-commonsenseqaBLOOM 176B (1-shot)
Accuracy: 64.2
common-sense-reasoning-on-commonsenseqaGPT-NeoX 20B (1-shot)
Accuracy: 60.4
common-sense-reasoning-on-commonsenseqaBloomberg GPT 50B (1-shot)
Accuracy: 65.5
common-sense-reasoning-on-recordOPT 66B (1-shot)
F1: 82.5
common-sense-reasoning-on-recordBloomberg GPT 50B (1-shot)
F1: 82.8
common-sense-reasoning-on-recordGPT-NeoX 20B (1-shot)
F1: 67.9
common-sense-reasoning-on-recordBLOOM 176B (1-shot)
F1: 78
common-sense-reasoning-on-winograndeOPT 66B (1-shot)
Accuracy: 66.1
common-sense-reasoning-on-winograndeBloomberg GPT (one-shot)
Accuracy: 64.1
common-sense-reasoning-on-winograndeBLOOM 176B (1-shot)
Accuracy: 67
common-sense-reasoning-on-winograndeGPT-NeoX (one-shot)
Accuracy: 60.6
logical-reasoning-on-big-bench-formalPaLM 540B (few-shot, k=3)
Accuracy: 53.6
logical-reasoning-on-big-bench-formalGPT-NeoX 20B (few-shot, k=3)
Accuracy: 52.8
logical-reasoning-on-big-bench-formalOPT 66B (few-shot, k=3)
Accuracy: 54
logical-reasoning-on-big-bench-formalBLOOM 176B (few-shot, k=3)
Accuracy: 52.8
logical-reasoning-on-big-bench-formalBloomberg GPT 50B (few-shot, k=3)
Accuracy: 50.8
logical-reasoning-on-big-bench-penguins-in-aGPT-NeoX (few-shot, k=3)
Accuracy: 33.56
logical-reasoning-on-big-bench-penguins-in-aOPT 66B (few-shot, k=3)
Accuracy: 28.08
logical-reasoning-on-big-bench-penguins-in-aBLOOM 176B (few-shot, k=3)
Accuracy: 40.41
logical-reasoning-on-big-bench-penguins-in-aBloomberg GPT (few-shot, k=3)
Accuracy: 37.67
logical-reasoning-on-big-bench-penguins-in-aPaLM 540B (few-shot, k=3)
Accuracy: 44.5
logical-reasoning-on-big-bench-reasoningPaLM 540B (few-shot, k=3)
Accuracy: 38
logical-reasoning-on-big-bench-reasoningBLOOM 176B (few-shot, k=3)
Accuracy: 36.8
logical-reasoning-on-big-bench-reasoningGPT-NeoX (few-shot, k=3)
Accuracy: 26
logical-reasoning-on-big-bench-reasoningOPT 66B (few-shot, k=3)
Accuracy: 31.2
logical-reasoning-on-big-bench-reasoningBloomberg GPT (few-shot, k=3)
Accuracy: 34.8
logical-reasoning-on-big-bench-temporalBloomberg GPT (few-shot, k=3)
Accuracy: 29.2
logical-reasoning-on-big-bench-temporalOPT 66B (few-shot, k=3)
Accuracy: 23.6
logical-reasoning-on-big-bench-temporalPaLM 540B (few-shot, k=3)
Accuracy: 39.6
logical-reasoning-on-big-bench-temporalBLOOM 176B (few-shot, k=3)
Accuracy: 36.8
logical-reasoning-on-big-bench-temporalGPT-NeoX (few-shot, k=3)
Accuracy: 21.2
multi-task-language-understanding-on-mmluBloomberg GPT 50B (5-shot)
Average (%): 39.2
multi-task-language-understanding-on-mmluBLOOM 176B (5-shot)
Average (%): 39.1
multi-task-language-understanding-on-mmluOPT 66B (5-shot)
Average (%): 36
multiple-choice-question-answering-mcqa-on-27BLOOM 176B (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27OPT 66B (few-shot, k=3)
Accuracy: 91.6
multiple-choice-question-answering-mcqa-on-27Bloomberg GPT (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27GPT-NeoX (few-shot, k=3)
Accuracy: 92
multiple-choice-question-answering-mcqa-on-27PaLM 540B (few-shot, k=3)
Accuracy: 70.8
multiple-choice-question-answering-mcqa-on-28GPT-NeoX (few-shot, k=3)
Accuracy: 86.4
multiple-choice-question-answering-mcqa-on-28OPT 66B (few-shot, k=3)
Accuracy: 91.2
multiple-choice-question-answering-mcqa-on-28BLOOM 176B (few-shot, k=3)
Accuracy: 91.2
multiple-choice-question-answering-mcqa-on-28Bloomberg GPT (few-shot, k=3)
Accuracy: 90.4
multiple-choice-question-answering-mcqa-on-28PaLM 540B (few-shot, k=3)
Accuracy: 87.2
multiple-choice-question-answering-mcqa-on-29Bloomberg GPT (few-shot, k=3)
Accuracy: 42
multiple-choice-question-answering-mcqa-on-29BLOOM 176B (few-shot, k=3)
Accuracy: 50
multiple-choice-question-answering-mcqa-on-29PaLM 540B (few-shot, k=3)
Accuracy: 62.4
multiple-choice-question-answering-mcqa-on-29OPT 66B (few-shot, k=3)
Accuracy: 42
multiple-choice-question-answering-mcqa-on-29GPT-NeoX (few-shot, k=3)
Accuracy: 45.2
multiple-choice-question-answering-mcqa-on-30BLOOM 176B (few-shot, k=3)
Accuracy: 54.8
multiple-choice-question-answering-mcqa-on-30Bloomberg GPT (few-shot, k=3)
Accuracy: 56
multiple-choice-question-answering-mcqa-on-30GPT-NeoX (few-shot, k=3)
Accuracy: 54
multiple-choice-question-answering-mcqa-on-30PaLM 540B (few-shot, k=3)
Accuracy: 76
multiple-choice-question-answering-mcqa-on-30OPT 66B (few-shot, k=3)
Accuracy: 52.8
natural-language-inference-on-anli-testBLOOM 176B (one-shot)
A1: 33.6
A2: 33.8
A3: 35.17
natural-language-inference-on-anli-testOPT 66B (one-shot)
A1: 33.1
A2: 34.2
A3: 34.92
natural-language-inference-on-anli-testGPT-NeoX (one-shot)
A1: 32.6
A2: 33.8
A3: 36.17
natural-language-inference-on-anli-testBloomberg GPT (one-shot)
A1: 32.9
A2: 34.4
A3: 37.33
natural-language-inference-on-commitmentbankOPT 66B (one-shot)
Accuracy: 44.64
natural-language-inference-on-commitmentbankGPT-NeoX (one-shot)
Accuracy: 48.21
natural-language-inference-on-commitmentbankBLOOM 176B (one-shot)
Accuracy: 48.21
natural-language-inference-on-commitmentbankBloomberg GPT (one-shot)
Accuracy: 53.57
natural-language-inference-on-rteGPT-NeoX 20B (1-shot)
Accuracy: 53.8%
natural-language-inference-on-rteBloomberg GPT 50B (1-shot)
Accuracy: 69.3%
natural-language-inference-on-rteOPT 66B (1-shot)
Accuracy: 54.9%
natural-language-inference-on-rteBLOOM 176B (1-shot)
Accuracy: 57.4%
question-answering-on-boolqBloomberg GPT 50B (1-shot)
Accuracy: 74.6
question-answering-on-boolqGPT-NeoX 20B (1-shot)
Accuracy: 46.4
question-answering-on-boolqOPT 66B (1-shot)
Accuracy: 57.5
question-answering-on-boolqBLOOM 176B (1-shot)
Accuracy: 52.9
question-answering-on-copaBLOOM 176B (one-shot)
Accuracy: 84
question-answering-on-copaOPT 66B (one-shot)
Accuracy: 86
question-answering-on-copaGPT-NeoX (one-shot)
Accuracy: 88
question-answering-on-copaBloomberg GPT (one-shot)
Accuracy: 86
question-answering-on-multircBLOOM 176B (1-shot)
F1: 26.7
question-answering-on-multircGPT-NeoX 20B (1-shot)
F1: 22.9
question-answering-on-multircOPT 66B (1-shot)
F1: 18.8
question-answering-on-multircBloomberg GPT 50B (1-shot)
F1: 62.3
question-answering-on-openbookqaBLOOM 176B (2-shot)
Accuracy: 47.2
question-answering-on-openbookqaBloomberg GPT 50B (1-shot)
Accuracy: 51.6
question-answering-on-openbookqaGPT-NeoX 50B (2-shot)
Accuracy: 44.2
question-answering-on-openbookqaOPT 66B (one-shot)
Accuracy: 58.0
question-answering-on-piqaOPT 66B (1-shot)
Accuracy: 77.6
question-answering-on-piqaGPT-NeoX 20B (1-shot)
Accuracy: 75.8
question-answering-on-piqaBloomberg GPT 50B (1-shot)
Accuracy: 77.9
question-answering-on-piqaBLOOM 176B (1-shot)
Accuracy: 77
reading-comprehension-on-raceBLOOM 176B (one-shot)
Accuracy (High): 39.14
Accuracy (Middle): 52.3
reading-comprehension-on-raceGPT-NeoX (one-shot)
Accuracy (High): 34.33
Accuracy (Middle): 41.23
reading-comprehension-on-raceOPT 66B (one-shot)
Accuracy (High): 37.02
Accuracy (Middle): 47.42
reading-comprehension-on-raceBloomberg GPT (one-shot)
Accuracy (High): 41.74
Accuracy (Middle): 54.32
sarcasm-detection-on-big-bench-snarksPaLM 540B (few-shot, k=3)
Accuracy: 78.1
sarcasm-detection-on-big-bench-snarksBloomberg GPT (few-shot, k=3)
Accuracy: 69.66
sarcasm-detection-on-big-bench-snarksBLOOM 176B (few-shot, k=3)
Accuracy: 72.47
sarcasm-detection-on-big-bench-snarksGPT-NeoX (few-shot, k=3)
Accuracy: 62.36

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
BloombergGPT:用于金融的大型语言模型 | 论文 | HyperAI超神经