| Neural Tree Indexers for Text Understanding | 93.1 | - | 355 | Entailment as Few-Shot Learner | |
| EFL (Entailment as Few-shot Learner) + RoBERTa-large | 93.1 | ? | 355m | Entailment as Few-Shot Learner | |
| RoBERTa-large + self-explaining layer | 92.3 | ? | 355m+ | Self-Explaining Structures Improve NLP Models | |
| RoBERTa-large+Self-Explaining | 92.3 | - | 340 | Self-Explaining Structures Improve NLP Models | |
| SJRC (BERT-Large +SRL) | 91.3 | 95.7 | 308m | Explicit Contextual Semantics for Text Comprehension | - |
| Densely-Connected Recurrent and Co-Attentive Network Ensemble | 90.1 | 95.0 | 53.3m | Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information | - |
| Fine-Tuned LM-Pretrained Transformer | 89.9 | 96.6 | 85m | Improving Language Understanding by Generative Pre-Training | - |
| 150D Multiway Attention Network Ensemble | 89.4 | 95.5 | 58m | Multiway Attention Networks for Modeling Sentence Pairs | - |
| ESIM + ELMo Ensemble | 89.3 | 92.1 | 40m | Deep contextualized word representations | |