Question Answering On Pubmedqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Meditron-70B (CoT + SC)81.6MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
BioGPT-Large(1.5B)81.0BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
RankRAG-llama3-70B (Zero-Shot)79.8RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs-
Med-PaLM 2 (5-shot)79.2Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (540B, Few-shot)79Large Language Models Encode Clinical Knowledge
BioGPT(345M)78.2BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Codex 5-shot CoT78.2Can large language models reason about medical questions?
Human Performance (single annotator)78.0PubMedQA: A Dataset for Biomedical Research Question Answering
GAL 120B (zero-shot)77.6Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)77.2Large Language Models Encode Clinical Knowledge
MediSwift-XL76.8MediSwift: Efficient Sparse Pre-trained Biomedical Language Models-
Flan-T5-XXL76.80Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark-
BioMedGPT-10B76.1BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Claude 3 Opus (5-shot)75.8The Claude 3 Model Family: Opus, Sonnet, Haiku-
Flan-PaLM (540B, SC)75.2Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)75.0Towards Expert-Level Medical Question Answering with Large Language Models
Claude 3 Opus (zero-shot)74.9The Claude 3 Model Family: Opus, Sonnet, Haiku-
Med-PaLM 2 (CoT + SC)74.0Towards Expert-Level Medical Question Answering with Large Language Models
BLOOM (zero-shot)73.6Galactica: A Large Language Model for Science
CoT-T5-11B (1024 Shot)73.42The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
0 of 29 row(s) selected.
Question Answering On Pubmedqa | SOTA | HyperAI超神经