Reading Comprehension On Race

评估指标

Accuracy
Accuracy (High)
Accuracy (Middle)

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
ALBERTxxlarge+DUMA(ensemble)89.892.688.7DUMA: Reading Comprehension with Transposition Thinking
Megatron-BERT (ensemble)90.990.093.1Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Megatron-BERT89.588.691.8Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
B10-10-1085.784.488.8Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
XLNet-84.088.6XLNet: Generalized Autoregressive Pretraining for Language Understanding
RoBERTa83.281.386.5RoBERTa: A Robustly Optimized BERT Pretraining Approach
LLaMA 65B (zero-shot)-51.667.9LLaMA: Open and Efficient Foundation Language Models
PaLM 540B (zero-shot)-49.168.1PaLM: Scaling Language Modeling with Pathways
LLaMA 33B (zero-shot)-48.364.1LLaMA: Open and Efficient Foundation Language Models
PaLM 62B (zero-shot)-47.564.3PaLM: Scaling Language Modeling with Pathways
LLaMA 13B (zero-shot)-47.261.6LLaMA: Open and Efficient Foundation Language Models
LLaMA 7B (zero-shot)-46.961.1LLaMA: Open and Efficient Foundation Language Models
GPT-3 175B (zero-shot)-45.5-Language Models are Few-Shot Learners
PaLM 8B (zero-shot)-42.357.9PaLM: Scaling Language Modeling with Pathways
Bloomberg GPT (one-shot)-41.7454.32BloombergGPT: A Large Language Model for Finance
BLOOM 176B (one-shot)-39.1452.3BloombergGPT: A Large Language Model for Finance
OPT 66B (one-shot)-37.0247.42BloombergGPT: A Large Language Model for Finance
GPT-NeoX (one-shot)-34.3341.23BloombergGPT: A Large Language Model for Finance
DeBERTalarge86.8--DeBERTa: Decoding-enhanced BERT with Disentangled Attention
GPT-3 175B (0-shot)--58.4Language Models are Few-Shot Learners
0 of 24 row(s) selected.
Reading Comprehension On Race | SOTA | HyperAI超神经