HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Zhang Xiaoman ; Wu Chaoyi ; Zhao Ziheng ; Lin Weixiong ; Zhang Ya ; Wang Yanfeng ; Xie Weidi

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Abstract

Medical Visual Question Answering (MedVQA) presents a significant opportunityto enhance diagnostic accuracy and healthcare delivery by leveraging artificialintelligence to interpret and answer questions based on medical images. In thisstudy, we reframe the problem of MedVQA as a generation task that naturallyfollows the human-machine interaction and propose a generative-based model formedical visual understanding by aligning visual information from a pre-trainedvision encoder with a large language model. We establish a scalable pipeline toconstruct a large-scale medical visual question-answering dataset, namedPMC-VQA, which contains 227k VQA pairs of 149k images that cover variousmodalities or diseases. We train the proposed model on PMC-VQA and thenfine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, andImage-Clef-2019, significantly outperforming existing MedVQA models ingenerating relevant, accurate free-form answers. In addition, we propose a testset that has undergone manual verification, which is significantly morechallenging, serving to better monitor the development of generative MedVQAmethods. To facilitate comprehensive evaluation and comparison, we havemaintained a leaderboard athttps://paperswithcode.com/paper/pmc-vqa-visual-instruction-tuning-for-medical,offering a centralized resource for tracking progress and benchmarkingstate-of-the-art approaches. The PMC-VQA dataset emerges as a vital resourcefor the field of research, and the MedVInT presents a significant breakthroughin the area of MedVQA.

Code Repositories

zihanzhaosjtu/librisqa
Mentioned in GitHub
xiaoman-zhang/PMC-VQA
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
generative-visual-question-answering-on-pmcMedVInT
BLEU-1: 23.2
visual-question-answering-vqa-on-pmc-vqaMedVInT
Accuracy: 42.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp