| ST-MoE-32B 269B (fine-tuned) | 95.2 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
| LLaMA 3 8B+MoSLoRA (fine-tuned) | 90.5 | Mixture-of-Subspaces in Low-Rank Adaptation | |
| LLaMA 65B + CFG (0-shot) | 84.2 | Stay on topic with Classifier-Free Guidance | - |
| LLaMA 30B + CFG (0-shot) | 83.2 | Stay on topic with Classifier-Free Guidance | - |
| FLAN 137B (few-shot, k=14) | 80.7 | Finetuned Language Models Are Zero-Shot Learners | |
| LLaMA 13B + CFG (0-shot) | 79.1 | Stay on topic with Classifier-Free Guidance | - |