| Turing NLR v5 XXL 5.4B (fine-tuned) | 92.6 | 92.4 | - | - |
| RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned) | 90.2 | - | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
| Q-BERT (Shen et al., 2020) | 87.8 | - | Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT | - |