Command Palette
Search for a command to run...
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Zhang Xiaoman ; Wu Chaoyi ; Zhao Ziheng ; Lin Weixiong ; Zhang Ya ; Wang Yanfeng ; Xie Weidi

Abstract
Medical Visual Question Answering (MedVQA) presents a significant opportunityto enhance diagnostic accuracy and healthcare delivery by leveraging artificialintelligence to interpret and answer questions based on medical images. In thisstudy, we reframe the problem of MedVQA as a generation task that naturallyfollows the human-machine interaction and propose a generative-based model formedical visual understanding by aligning visual information from a pre-trainedvision encoder with a large language model. We establish a scalable pipeline toconstruct a large-scale medical visual question-answering dataset, namedPMC-VQA, which contains 227k VQA pairs of 149k images that cover variousmodalities or diseases. We train the proposed model on PMC-VQA and thenfine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, andImage-Clef-2019, significantly outperforming existing MedVQA models ingenerating relevant, accurate free-form answers. In addition, we propose a testset that has undergone manual verification, which is significantly morechallenging, serving to better monitor the development of generative MedVQAmethods. To facilitate comprehensive evaluation and comparison, we havemaintained a leaderboard athttps://paperswithcode.com/paper/pmc-vqa-visual-instruction-tuning-for-medical,offering a centralized resource for tracking progress and benchmarkingstate-of-the-art approaches. The PMC-VQA dataset emerges as a vital resourcefor the field of research, and the MedVInT presents a significant breakthroughin the area of MedVQA.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| generative-visual-question-answering-on-pmc | MedVInT | BLEU-1: 23.2 |
| visual-question-answering-vqa-on-pmc-vqa | MedVInT | Accuracy: 42.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.