Question Answering On Medqa Usmle

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Med-Gemini91.1Capabilities of Gemini Models in Medicine-
GPT-490.2Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Med-PaLM 285.4Towards Expert-Level Medical Question Answering with Large Language Models
Med-PaLM 2 (CoT + SC)83.7Towards Expert-Level Medical Question Answering with Large Language Models
Med-PaLM 2 (5-shot)79.7Towards Expert-Level Medical Question Answering with Large Language Models
MedMobile (3.8B)75.7MedMobile: A mobile-sized language model with expert-level clinical capabilities
Meerkat-7B74.3Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks-
Meerkat-7B (Single)70.6Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks-
Meditron-70B (CoT + SC)70.2MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Flan-PaLM (540 B)67.6Large Language Models Encode Clinical Knowledge
LLAMA-2 (70B SC CoT)61.5MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Shakti-LLM (2.5B)60.3SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments-
Codex 5-shot CoT60.2Can large language models reason about medical questions?
LLAMA-2 (70B)59.2MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
VOD (BioLinkBERT)55.0Variational Open-Domain Question Answering
BioMedGPT-10B50.4BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
PubMedGPT (2.7 B)50.3Large Language Models Encode Clinical Knowledge
DRAGON + BioLinkBERT47.5Deep Bidirectional Language-Knowledge Graph Pretraining
BioLinkBERT (340 M)45.1Large Language Models Encode Clinical Knowledge
GAL 120B (zero-shot)44.4Galactica: A Large Language Model for Science
0 of 27 row(s) selected.
Question Answering On Medqa Usmle | SOTA | HyperAI超神经