Command Palette
Search for a command to run...
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar Dan Alistarh

Abstract
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| common-sense-reasoning-on-arc-challenge | OPT-175B (50% Sparsity) | Accuracy: 25.6 |
| common-sense-reasoning-on-arc-challenge | OPT-175B | Accuracy: 43.94 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 38.99 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 50% Sparsity) | Accuracy: 41.3 |
| common-sense-reasoning-on-arc-challenge | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 39.85 |
| common-sense-reasoning-on-arc-easy | SparseGPT 175B (50% sparsity) | Accuracy: 69.65 |
| common-sense-reasoning-on-arc-easy | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 68.35 |
| common-sense-reasoning-on-arc-easy | OPT-175B | Accuracy: 71.04 |
| common-sense-reasoning-on-arc-easy | SparseGPT 175B (2:4 sparsity) | Accuracy: 67.08 |
| common-sense-reasoning-on-arc-easy | OPT 175B (50% Sparsity) | Accuracy: 28.03 |
| language-modelling-on-lambada | OPT-175B (50% Sparsity) | Accuracy: 0.02 |
| language-modelling-on-lambada | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 79.47 |
| language-modelling-on-lambada | SparseGPT (175B, 50% Sparsity) | Accuracy: 76.51 |
| language-modelling-on-lambada | OPT-175B | Accuracy: 75.59 |
| language-modelling-on-lambada | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 78.77 |
| language-modelling-on-wikitext-2 | OPT-175B (50% Sparsity) | Test perplexity: 234.77 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 50% Sparsity) | Test perplexity: 8.21 |
| language-modelling-on-wikitext-2 | OPT-175B | Test perplexity: 8.34 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 2:4 Sparsity) | Test perplexity: 8.73 |
| language-modelling-on-wikitext-2 | SparseGPT (175B, 4:8 Sparsity) | Test perplexity: 8.45 |
| question-answering-on-piqa | SparseGPT 175B (50% Sparsity) | Accuracy: 80.63 |
| question-answering-on-piqa | OPT-175B (50% Sparsity) | Accuracy: 54.73 |
| question-answering-on-piqa | OPT-175B | Accuracy: 81.07 |
| question-answering-on-piqa | SparseGPT 175B (4:8 Sparsity) | Accuracy: 79.54 |
| question-answering-on-piqa | SparseGPT 175B (2:4 Sparsity) | Accuracy: 79.54 |
| question-answering-on-storycloze | SparseGPT (175B, 2:4 Sparsity) | Accuracy: 76.19 |
| question-answering-on-storycloze | SparseGPT (175B, 50% Sparsity) | Accuracy: 78.87 |
| question-answering-on-storycloze | SparseGPT (175B, 4:8 Sparsity) | Accuracy: 77.02 |
| question-answering-on-storycloze | OPT-175B | Accuracy: 79.82 |
| question-answering-on-storycloze | OPT-175B (50% Sparsity) | Accuracy: 47.10 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.