| PaLM 540B (finetuned) | 78.8 | PaLM: Scaling Language Modeling with Pathways | |
| ST-MoE-32B 269B (fine-tuned) | 77.7 | ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
| UL2 20B (fine-tuned) | 77.3 | UL2: Unifying Language Learning Paradigms | |
| SenseBERT-large 340M | 72.1 | SenseBERT: Driving Some Sense into BERT | - |
| SenseBERT-base 110M | 70.3 | SenseBERT: Driving Some Sense into BERT | - |