HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MedConceptsQA: Open Source Medical Concepts QA Benchmark

Ofir Ben Shoham Nadav Rappoport

MedConceptsQA: Open Source Medical Concepts QA Benchmark

Abstract

We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at https://huggingface.co/datasets/ofir408/MedConceptsQA

Code Repositories

nadavlab/MedConceptsQA
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
few-shot-learning-on-medconceptsqajohnsnowlabs/JSL-MedMNX-7B
Accuracy: 25.627

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MedConceptsQA: Open Source Medical Concepts QA Benchmark | Papers | HyperAI