Common Sense Reasoning On Commonsenseqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
GPT-4o (HPT)92.54Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
DeBERTaV3-large+KEAR91.2Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
PaLM 2 (few‑shot, CoT, SC)90.4PaLM 2 Technical Report
KEAR89.4Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
DEKCOR83.3Fusing Context Into Knowledge Graph for Commonsense Question Answering
Unicorn 11B (fine-tuned)79.3UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
MUPPET Roberta Large79.2Muppet: Massive Multi-task Representations with Pre-Finetuning
UnifiedQA 11B (fine-tuned)79.1UnifiedQA: Crossing Format Boundaries With a Single QA System
DRAGON78.2Deep Bidirectional Language-Knowledge Graph Pretraining
T5-XXL 11B (fine-tuned)78.1UnifiedQA: Crossing Format Boundaries With a Single QA System
Albert Lan et al. (2020) (ensemble)76.5ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
UnifiedQA 11B (zero-shot)76.2UnifiedQA: Crossing Format Boundaries With a Single QA System
QA-GNN76.1QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
XLNet+GraphReason75.3Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
GrapeQA: PEGA73.5GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering-
RoBERTa+HyKAS Ma et al. (2019)73.2Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering-
GPT-3 Direct Finetuned73.0Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention
STaR (on GPT-J)72.3STaR: Bootstrapping Reasoning With Reasoning
RoBERTa-Large 355M72.1RoBERTa: A Robustly Optimized BERT Pretraining Approach
STaR without Rationalization (on GPT-J)68.8STaR: Bootstrapping Reasoning With Reasoning
0 of 38 row(s) selected.
Common Sense Reasoning On Commonsenseqa | SOTA | HyperAI超神经