| TANDA-DeBERTa-V3-Large + ALL | 0.927 | 0.939 | Structural Self-Supervised Objectives for Transformers | |
| Key-Value Memory Network | 0.7069 | 0.7265 | Key-Value Memory Networks for Directly Reading Documents | |
| AP-CNN | 0.6886 | 0.6957 | Attentive Pooling Networks | |
| LSTM (lexical overlap + dist output) | 0.682 | 0.6988 | Neural Variational Inference for Text Processing | |