Common Sense Reasoning On Winogrande

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
ST-MoE-32B 269B (fine-tuned)96.1ST-MoE: Designing Stable and Transferable Sparse Expert Models
Unicorn 11B (fine-tuned)91.3UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
CompassMTL 567M with Tailor90.5Task Compass: Scaling Multi-task Pre-training with Task Prefix
CompassMTL 567M89.6Task Compass: Scaling Multi-task Pre-training with Task Prefix
UnifiedQA 11B (fine-tuned)89.4UnifiedQA: Crossing Format Boundaries With a Single QA System
Claude 3 Opus (5-shot)88.5The Claude 3 Model Family: Opus, Sonnet, Haiku-
GPT-4 (5-shot)87.5GPT-4 Technical Report
ExDeBERTa 567M87Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-2 13B + MixLoRA86.3MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA3 8B+MoSLoRA85.8Mixture-of-Subspaces in Low-Rank Adaptation
PaLM 2-L (1-shot)83.0PaLM 2 Technical Report
LLaMA-3 8B + MixLoRA82.1MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
ST-MoE-L 4.1B (fine-tuned)81.7ST-MoE: Designing Stable and Transferable Sparse Expert Models
GPT-3.5 (5-shot)81.6GPT-4 Technical Report
PaLM 540B (0-shot)81.1PaLM: Scaling Language Modeling with Pathways
Camelidae-8×34B80.9Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
PaLM 2-M (1-shot)79.2PaLM 2 Technical Report
RoBERTa-Winogrande 355M (fine-tuned)79.1WinoGrande: An Adversarial Winograd Schema Challenge at Scale
PaLM 2-S (1-shot)77.9PaLM 2 Technical Report
Mixtral 8x7B (0-shot)77.2Mixtral of Experts
0 of 77 row(s) selected.
Common Sense Reasoning On Winogrande | SOTA | HyperAI超神经