Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia.
eResearch Center, Monash University, Clayton, VIC, 3800, Australia.
Artif Intell Med. 2023 Sep;143:102611. doi: 10.1016/j.artmed.2023.102611. Epub 2023 Jun 8.
Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up-to-date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovations, and potential improvements. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide comprehensive and helpful information for researchers interested in the medical visual question answering field and encourage them to conduct further research in this field.
医学视觉问答 (VQA) 是医学人工智能和流行的 VQA 挑战的结合。给定医学图像和自然语言中临床相关的问题,医学 VQA 系统预计将预测出合理且令人信服的答案。尽管一般领域的 VQA 已经得到了广泛的研究,但由于其任务特点,医学 VQA 仍然需要特定的调查和探索。在本调查的第一部分,我们收集和讨论了最新的公开医学 VQA 数据集,包括数据源、数据量和任务特点。在第二部分,我们回顾了医学 VQA 任务中使用的方法。我们总结并讨论了它们的技术、创新和潜在的改进。在最后一部分,我们分析了该领域的一些医学特定挑战,并讨论了未来的研究方向。我们的目标是为对医学视觉问答领域感兴趣的研究人员提供全面和有用的信息,并鼓励他们在该领域进行进一步的研究。