| ST-MoE-32B 269B (fine-tuned) | 92.4 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
| PaLM 540B (fine-tuned) | 92.2 | PaLM: Scaling Language Modeling with Pathways | |
| UL2 20B (fine-tuned) | 90.8 | UL2: Unifying Language Learning Paradigms | |
| FLAN 137B (prompt-tuned) | 86.3 | Finetuned Language Models Are Zero-Shot Learners | |
| RoBERTa-large 355M + Entailment as Few-shot Learner | 86.0 | Entailment as Few-Shot Learner | |