Natural Language Inference On Anli Test

评估指标

A1
A2
A3

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
T5-3B (explanation prompting)81.872.574.8Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
PaLM 540B (Self Improvement, Self Consistency)-66.567.9Large Language Models Can Self-Improve-
PaLM 540B (Self Improvement, CoT Prompting)-65.367.3Large Language Models Can Self-Improve-
PaLM 540B (Self Improvement, Standard-Prompting)-64.866.9Large Language Models Can Self-Improve-
PaLM 540B (Self Consistency)-64.563.4Large Language Models Can Self-Improve-
PaLM 2-L (one-shot)73.163.467.1PaLM 2 Technical Report
T0-11B (explanation prompting)75.660.659.9Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues}-
PaLM 540B (CoT Prompting)-58.960.6Large Language Models Can Self-Improve-
PaLM 540B (Standard-Prompting)-55.855.8Large Language Models Can Self-Improve-
ChatGPT62.352.654.1A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
ALUM (RoBERTa-LARGE)72.352.148.4Adversarial Training for Large Neural Language Models
XLNet (Large)70.350.949.4XLNet: Generalized Autoregressive Pretraining for Language Understanding
InfoBERT (RoBERTa)7550.547.7InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
RoBERTa (Large)72.449.844.4RoBERTa: A Robustly Optimized BERT Pretraining Approach
PaLM 2-M (one-shot)58.149.554.5PaLM 2 Technical Report
PaLM 2-S (one-shot)53.148.853.2PaLM 2 Technical Report
T0-3B (CoT fine-tuned)41.737.241.9The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Flipped-3B39.9937.0537.73Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
KiC-770M36.3035.0037.60Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models-
RoE-3B35.4934.6431.22Exploring the Benefits of Training Expert Language Models over Instruction Tuning
0 of 25 row(s) selected.
Natural Language Inference On Anli Test | SOTA | HyperAI超神经