• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大规模医学视觉问答数据集的开发。

Development of a large-scale medical visual question-answering dataset.

作者信息

Zhang Xiaoman, Wu Chaoyi, Zhao Ziheng, Lin Weixiong, Zhang Ya, Wang Yanfeng, Xie Weidi

机构信息

Shanghai Jiao Tong University, Shanghai, China.

Shanghai Artificial Intelligence Laboratory, Shanghai, China.

出版信息

Commun Med (Lond). 2024 Dec 21;4(1):277. doi: 10.1038/s43856-024-00709-2.

DOI:10.1038/s43856-024-00709-2
PMID:39709495
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11663219/
Abstract

BACKGROUND

Medical Visual Question Answering (MedVQA) enhances diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret medical images. This study aims to redefine MedVQA as a generation task that mirrors human-machine interaction and to develop a model capable of integrating complex visual and textual information.

METHODS

We constructed a large-scale medical visual-question answering dataset, PMC-VQA, containing 227,000 VQA pairs across 149,000 images that span various modalities and diseases. We introduced a generative model that aligns visual information from a pre-trained vision encoder with a large language model. This model was initially trained on PMC-VQA and subsequently fine-tuned on multiple public benchmarks.

RESULTS

Here, we show that our model significantly outperforms existing MedVQA models in generating relevant, accurate free-form answers. We also propose a manually verified test set that presents a greater challenge and serves as a robust measure to monitor the advancement of generative MedVQA methods.

CONCLUSIONS

The PMC-VQA dataset proves to be an essential resource for the research community, and our model marks a significant breakthrough in MedVQA. We maintain a leaderboard to facilitate comprehensive evaluation and comparison, providing a centralized resource for benchmarking state-of-the-art approaches.

摘要

背景

医学视觉问答(MedVQA)通过利用人工智能解读医学图像来提高诊断准确性和医疗服务水平。本研究旨在将MedVQA重新定义为一种模拟人机交互的生成任务,并开发一种能够整合复杂视觉和文本信息的模型。

方法

我们构建了一个大规模的医学视觉问答数据集PMC-VQA,其中包含跨越149,000张图像的227,000个问答对,这些图像涵盖了各种模态和疾病。我们引入了一种生成模型,该模型将预训练视觉编码器中的视觉信息与一个大语言模型对齐。此模型最初在PMC-VQA上进行训练,随后在多个公共基准上进行微调。

结果

在此,我们表明我们的模型在生成相关、准确的自由形式答案方面显著优于现有的MedVQA模型。我们还提出了一个经过人工验证的测试集,该测试集带来了更大的挑战,并作为监测生成式MedVQA方法进展的有力指标。

结论

PMC-VQA数据集被证明是研究社区的重要资源,我们的模型在MedVQA方面取得了重大突破。我们维护了一个排行榜以促进全面评估和比较,为基准测试最先进的方法提供了一个集中资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/057e7f701525/43856_2024_709_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/44ddbfc8180d/43856_2024_709_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/9759ed464f77/43856_2024_709_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/0ff0e6c9935c/43856_2024_709_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/0f76532d1ace/43856_2024_709_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/057e7f701525/43856_2024_709_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/44ddbfc8180d/43856_2024_709_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/9759ed464f77/43856_2024_709_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/0ff0e6c9935c/43856_2024_709_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/0f76532d1ace/43856_2024_709_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec2/11663219/057e7f701525/43856_2024_709_Fig5_HTML.jpg

相似文献

1
Development of a large-scale medical visual question-answering dataset.大规模医学视觉问答数据集的开发。
Commun Med (Lond). 2024 Dec 21;4(1):277. doi: 10.1038/s43856-024-00709-2.
2
Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练
J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.
3
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
4
Structure Causal Models and LLMs Integration in Medical Visual Question Answering.医学视觉问答中的结构因果模型与语言模型集成
IEEE Trans Med Imaging. 2025 Aug;44(8):3476-3489. doi: 10.1109/TMI.2025.3564320.
5
Parallel multi-head attention and term-weighted question embedding for medical visual question answering.用于医学视觉问答的并行多头注意力机制和词加权问题嵌入
Multimed Tools Appl. 2023 Mar 11:1-22. doi: 10.1007/s11042-023-14981-2.
6
Integrating deep learning for visual question answering in Agricultural Disease Diagnostics: Case Study of Wheat Rust.将深度学习应用于农业病害诊断中的视觉问答:以小麦锈病为例。
Sci Rep. 2024 Nov 15;14(1):28203. doi: 10.1038/s41598-024-79793-2.
7
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.
8
Vision-Language Transformer for Interpretable Pathology Visual Question Answering.用于可解释病理学视觉问答的视觉-语言转换器。
IEEE J Biomed Health Inform. 2023 Apr;27(4):1681-1690. doi: 10.1109/JBHI.2022.3163751. Epub 2023 Apr 4.
9
Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning.基于条件推理和对比学习的医学视觉问答。
IEEE Trans Med Imaging. 2023 May;42(5):1532-1545. doi: 10.1109/TMI.2022.3232411. Epub 2023 May 2.
10
Medical visual question answering based on question-type reasoning and semantic space constraint.基于问题类型推理和语义空间约束的医学视觉问答。
Artif Intell Med. 2022 Sep;131:102346. doi: 10.1016/j.artmed.2022.102346. Epub 2022 Jun 30.

引用本文的文献

1
Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data.通过利用网络规模的二维和三维医学数据构建放射学通用基础模型。
Nat Commun. 2025 Aug 23;16(1):7866. doi: 10.1038/s41467-025-62385-7.

本文引用的文献

1
Interactive computer-aided diagnosis on medical image using large language models.使用大语言模型对医学图像进行交互式计算机辅助诊断。
Commun Eng. 2024 Sep 17;3(1):133. doi: 10.1038/s44172-024-00271-8.
2
Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.基于生成式人工智能的以患者为中心的放射科报告:为放射科报告增添价值。
Sci Rep. 2024 Jun 8;14(1):13218. doi: 10.1038/s41598-024-63824-z.
3
PMC-LLaMA: toward building open-source language models for medicine.PMC-LLaMA:为医学构建开源语言模型的努力。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1833-1843. doi: 10.1093/jamia/ocae045.
4
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
5
Medical visual question answering: A survey.医学视觉问答:综述。
Artif Intell Med. 2023 Sep;143:102611. doi: 10.1016/j.artmed.2023.102611. Epub 2023 Jun 8.
6
The Role of Large Language Models in Medical Education: Applications and Implications.大语言模型在医学教育中的作用:应用与启示
JMIR Med Educ. 2023 Aug 14;9:e50945. doi: 10.2196/50945.
7
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
8
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
9
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
10
The Medical Segmentation Decathlon.医学分割十项全能
Nat Commun. 2022 Jul 15;13(1):4128. doi: 10.1038/s41467-022-30695-9.