
摘要
我们首次证明,大规模生成式预训练变换器(GPT)系列模型可在无需任何微调的情况下,通过一次性剪枝实现至少50%的稀疏度,且准确率损失极小。这一成果得益于一种专为高效、精准处理大规模GPT系列模型而设计的新剪枝方法——SparseGPT。我们可在不到4.5小时内完成对目前最大规模的开源模型OPT-175B和BLOOM-176B的剪枝,达到60%的非结构化稀疏度,同时困惑度(perplexity)几乎无增长:令人瞩目的是,在推理阶段,这些模型中超过1000亿个权重可被忽略。SparseGPT还可推广至半结构化剪枝模式(如2:4和4:8),并与权重量化方法兼容。相关代码已开源,地址为:https://github.com/IST-DASLab/sparsegpt。
代码仓库
baithebest/sparsellm
pytorch
GitHub 中提及
baithebest/adagp
pytorch
GitHub 中提及
eth-easl/deltazip
pytorch
GitHub 中提及
nvlabs/maskllm
pytorch
GitHub 中提及
ist-daslab/sparsegpt
官方
pytorch
GitHub 中提及
nvidia/tensorrt-model-optimizer
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| common-sense-reasoning-on-arc-challenge | OPT-175B (50% Sparsity) | Accuracy: 25.6 |
| common-sense-reasoning-on-arc-challenge | OPT-175B | Accuracy: 43.94 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 38.99 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 50% Sparsity) | Accuracy: 41.3 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 39.85 |
| common-sense-reasoning-on-arc-easy | SparseGPT 175B (50% sparsity) | Accuracy: 69.65 |
| common-sense-reasoning-on-arc-easy | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 68.35 |
| common-sense-reasoning-on-arc-easy | OPT-175B | Accuracy: 71.04 |
| common-sense-reasoning-on-arc-easy | SparseGPT 175B (2:4 sparsity) | Accuracy: 67.08 |
| common-sense-reasoning-on-arc-easy | OPT 175B (50% Sparsity) | Accuracy: 28.03 |
| language-modelling-on-lambada | OPT-175B (50% Sparsity) | Accuracy: 0.02 |
| language-modelling-on-lambada | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 79.47 |
| language-modelling-on-lambada | SparseGPT (175B, 50% Sparsity) | Accuracy: 76.51 |
| language-modelling-on-lambada | OPT-175B | Accuracy: 75.59 |
| language-modelling-on-lambada | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 78.77 |
| language-modelling-on-wikitext-2 | OPT-175B (50% Sparsity) | Test perplexity: 234.77 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 50% Sparsity) | Test perplexity: 8.21 |
| language-modelling-on-wikitext-2 | OPT-175B | Test perplexity: 8.34 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 2:4 Sparsity) | Test perplexity: 8.73 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 4:8 Sparsity) | Test perplexity: 8.45 |
| question-answering-on-piqa | SparseGPT 175B (50% Sparsity) | Accuracy: 80.63 |
| question-answering-on-piqa | OPT-175B (50% Sparsity) | Accuracy: 54.73 |
| question-answering-on-piqa | OPT-175B | Accuracy: 81.07 |
| question-answering-on-piqa | SparseGPT 175B (4:8 Sparsity) | Accuracy: 79.54 |
| question-answering-on-piqa | SparseGPT 175B (2:4 Sparsity) | Accuracy: 79.54 |
| question-answering-on-storycloze | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 76.19 |
| question-answering-on-storycloze | SparseGPT (175B, 50% Sparsity) | Accuracy: 78.87 |
| question-answering-on-storycloze | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 77.02 |
| question-answering-on-storycloze | OPT-175B | Accuracy: 79.82 |
| question-answering-on-storycloze | OPT-175B (50% Sparsity) | Accuracy: 47.10 |