Multiple Choice Question Answering Mcqa On 21

评估指标

Dev Set (Acc-%)
Test Set (Acc-%)

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Meditron-70B (CoT + SC)66.0-MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Codex 5-shot CoT0.5970.627Can large language models reason about medical questions?
VOD (BioLinkBERT)0.5830.629Variational Open-Domain Question Answering
Flan-PaLM (540B, SC)0.576-Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, Few-shot)0.565-Large Language Models Encode Clinical Knowledge
PaLM (540B, Few-shot)0.545-Large Language Models Encode Clinical Knowledge
Flan-PaLM (540B, CoT)0.536-Large Language Models Encode Clinical Knowledge
GAL 120B (zero-shot)0.529-Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)0.462-Large Language Models Encode Clinical Knowledge
PaLM (62B, Few-shot)0.434-Large Language Models Encode Clinical Knowledge
PubmedBERT(Gu et al., 2022)0.400.41MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
SciBERT (Beltagy et al., 2019)0.390.39MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BioBERT (Lee et al.,2020)0.380.37MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
BERT (Devlin et al., 2019)-Base0.350.33MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Flan-PaLM (8B, Few-shot)0.345-Large Language Models Encode Clinical Knowledge
BLOOM (few-shot, k=5)0.325-Galactica: A Large Language Model for Science
OPT (few-shot, k=5)0.296-Galactica: A Large Language Model for Science
PaLM (8B, Few-shot)0.267-Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)-0.723Towards Expert-Level Medical Question Answering with Large Language Models
BioMedGPT-10B-0.514BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
0 of 22 row(s) selected.
Multiple Choice Question Answering Mcqa On 21 | SOTA | HyperAI超神经