医学视觉问答：综述。

Medical visual question answering: A survey.

机构信息

Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia.

eResearch Center, Monash University, Clayton, VIC, 3800, Australia.

出版信息

Artif Intell Med. 2023 Sep;143:102611. doi: 10.1016/j.artmed.2023.102611. Epub 2023 Jun 8.

DOI:10.1016/j.artmed.2023.102611

PMID:37673579

Abstract

Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up-to-date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovations, and potential improvements. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide comprehensive and helpful information for researchers interested in the medical visual question answering field and encourage them to conduct further research in this field.

摘要

医学视觉问答 (VQA) 是医学人工智能和流行的 VQA 挑战的结合。给定医学图像和自然语言中临床相关的问题，医学 VQA 系统预计将预测出合理且令人信服的答案。尽管一般领域的 VQA 已经得到了广泛的研究，但由于其任务特点，医学 VQA 仍然需要特定的调查和探索。在本调查的第一部分，我们收集和讨论了最新的公开医学 VQA 数据集，包括数据源、数据量和任务特点。在第二部分，我们回顾了医学 VQA 任务中使用的方法。我们总结并讨论了它们的技术、创新和潜在的改进。在最后一部分，我们分析了该领域的一些医学特定挑战，并讨论了未来的研究方向。我们的目标是为对医学视觉问答领域感兴趣的研究人员提供全面和有用的信息，并鼓励他们在该领域进行进一步的研究。

相似文献

Medical visual question answering: A survey.医学视觉问答：综述。

Artif Intell Med. 2023 Sep;143:102611. doi: 10.1016/j.artmed.2023.102611. Epub 2023 Jun 8.

Parallel multi-head attention and term-weighted question embedding for medical visual question answering.用于医学视觉问答的并行多头注意力机制和词加权问题嵌入

Multimed Tools Appl. 2023 Mar 11:1-22. doi: 10.1007/s11042-023-14981-2.

Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。

Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.

Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。

Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.

Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering.多模态显式稀疏注意力网络的视觉问答。

Sensors (Basel). 2020 Nov 26;20(23):6758. doi: 10.3390/s20236758.

The multi-modal fusion in visual question answering: a review of attention mechanisms.视觉问答中的多模态融合：注意力机制综述

PeerJ Comput Sci. 2023 May 30;9:e1400. doi: 10.7717/peerj-cs.1400. eCollection 2023.

Medical visual question answering based on question-type reasoning and semantic space constraint.基于问题类型推理和语义空间约束的医学视觉问答。

Artif Intell Med. 2022 Sep;131:102346. doi: 10.1016/j.artmed.2022.102346. Epub 2022 Jun 30.

Multitask Learning for Visual Question Answering.用于视觉问答的多任务学习

IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1380-1394. doi: 10.1109/TNNLS.2021.3105284. Epub 2023 Feb 28.

Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study.通过自训练语言-图像预训练提高多模态医学任务的准确性（BioMedBLIP）：性能评估研究

JMIR Med Inform. 2024 Aug 5;12:e56627. doi: 10.2196/56627.

Advancing surgical VQA with scene graph knowledge.利用场景图知识推进外科视觉问答。

Int J Comput Assist Radiol Surg. 2024 Jul;19(7):1409-1417. doi: 10.1007/s11548-024-03141-y. Epub 2024 May 23.

引用本文的文献

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

Development of a large-scale medical visual question-answering dataset.大规模医学视觉问答数据集的开发。

Commun Med (Lond). 2024 Dec 21;4(1):277. doi: 10.1038/s43856-024-00709-2.

Informed-Learning-Guided Visual Question Answering Model of Crop Disease.基于知识学习引导的作物病害视觉问答模型

Plant Phenomics. 2024 Dec 16;6:0277. doi: 10.34133/plantphenomics.0277. eCollection 2024.

Vision-language models for medical report generation and visual question answering: a review.用于医学报告生成和视觉问答的视觉语言模型：综述

Front Artif Intell. 2024 Nov 19;7:1430984. doi: 10.3389/frai.2024.1430984. eCollection 2024.

JMIR Med Inform. 2024 Aug 5;12:e56627. doi: 10.2196/56627.

ChatFFA: An ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography.ChatFFA：一种用于眼底荧光血管造影的统一视觉语言理解和问答的眼科聊天系统。

iScience. 2024 May 17;27(7):110021. doi: 10.1016/j.isci.2024.110021. eCollection 2024 Jul 19.

Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology - a recent scoping review.使用大型语言模型（如 ChatGPT）进行诊断医学的挑战和障碍，重点是数字病理学——近期的范围综述。

Diagn Pathol. 2024 Feb 27;19(1):43. doi: 10.1186/s13000-024-01464-7.

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning.克服放射学中人工智能发展和实施面临的挑战：超越监督学习的解决方案综合述评。

Korean J Radiol. 2023 Nov;24(11):1061-1080. doi: 10.3348/kjr.2023.0393. Epub 2023 Aug 28.

BPI-MVQA: a bi-branch model for medical visual question answering.BPI-MVQA：一种用于医学视觉问答的双分支模型。

BMC Med Imaging. 2022 Apr 29;22(1):79. doi: 10.1186/s12880-022-00800-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医学视觉问答：综述。

Medical visual question answering: A survey.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献