| GPT-3 175B (Few-Shot) | 86.4 | Language Models are Few-Shot Learners | |
| LLaMA-65B+CFG (Zero-Shot) | 84.0 | Stay on topic with Classifier-Free Guidance | - |
| LLaMA-30B+CFG (zero-shot) | 83.9 | Stay on topic with Classifier-Free Guidance | - |
| LLaMA-13B+CFG (zero-shot) | 82.2 | Stay on topic with Classifier-Free Guidance | - |
| GLM-130B (bidirectional attention) | 80.2 | GLM-130B: An Open Bilingual Pre-trained Model | |
| SparseGPT (175B, 2:4 Sparsity) | 79.47 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
| SparseGPT (175B, 4:8 Sparsity) | 78.77 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
| SparseGPT (175B, 50% Sparsity) | 76.51 | SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
| GPT-3 175B (Zero-Shot) | 76.2 | Language Models are Few-Shot Learners | |
| GPT-3 13B (Zero-Shot) | 72.5 | Language Models are Few-Shot Learners | |