| RoBERTa-large with LlamBERT | 96.68 | LlamBERT: Large-scale low-cost data annotation in NLP | |
| Heinsen Routing + RoBERTa Large | 96.2 | An Algorithm for Routing Vectors in Sequences | |
| RoBERTa-large 355M + Entailment as Few-shot Learner | 96.1 | Entailment as Few-Shot Learner | |
| DV-ngrams-cosine with NB sub-sampling + RoBERTa.base | 95.94 | The Document Vectors Using Cosine Similarity Revisited | |
| DV-ngrams-cosine + RoBERTa.base | 95.92 | The Document Vectors Using Cosine Similarity Revisited | |
| Llama-2-70b-chat (0-shot) | 95.39 | LlamBERT: Large-scale low-cost data annotation in NLP | |
| FLAN 137B (few-shot, k=2) | 95 | Finetuned Language Models Are Zero-Shot Learners | |
| Block-sparse LSTM | 94.99 | GPU Kernels for Block-Sparse Weights | - |