Question Answering On Piqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Unicorn 11B (fine-tuned)90.1UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
LLaMA3 8B+MoSLoRA89.7Mixture-of-Subspaces in Low-Rank Adaptation
CompassMTL 567M with Tailor88.3Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-3 8B + MixLoRA87.6MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
DeBERTa-Large 304M87.4Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
CompassMTL 567M87.3Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-2 13B + MixLoRA86.8MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Shakti-LLM (2.5B)86.2SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments-
DeBERTa-Large 304M (classification-based)85.9Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
ExDeBERTa 567M85.5Task Compass: Scaling Multi-task Pre-training with Task Prefix
UnifiedQA 3B85.3UnifiedQA: Crossing Format Boundaries With a Single QA System
PaLM 2-L (1-shot)85.0PaLM 2 Technical Report
Mixtral 8x7B (0-shot)83.6Mixtral of Experts
PaLM 2-M (1-shot)83.2PaLM 2 Technical Report
LLaMA-2 7B + MixLoRA83.2MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Mistral 7B (0-shot)83.0Mistral 7B
LLaMA 65B (0-shot)82.8LLaMA: Open and Efficient Foundation Language Models
LLaMA 2 70B (0-shot)82.8Llama 2: Open Foundation and Fine-Tuned Chat Models
Camelidae-8×34B82.7Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
LLaMA 33B (0-shot)82.3LLaMA: Open and Efficient Foundation Language Models
0 of 67 row(s) selected.
Question Answering On Piqa | SOTA | HyperAI超神经