Sun Zhaoyi, Lin Mingquan, Zhu Qingqing, Xie Qianqian, Wang Fei, Lu Zhiyong, Peng Yifan
ArXiv. 2023 Oct 18:arXiv:2307.07362v3.
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
未来的计算机辅助诊断和预后系统应能够同时处理多模态数据。多模态深度学习(MDL)涉及图像和文本等多种数据来源的整合,有潜力彻底改变生物医学数据的分析和解释方式。然而,它直到最近才引起研究人员的关注。为此,迫切需要对该主题进行系统综述,识别当前工作的局限性,并探索未来方向。在本范围综述中,我们旨在全面概述该领域的现状,识别关键概念、研究类型和研究空白,重点关注生物医学图像和文本联合学习,主要是因为这两种是MDL研究中最常见的数据类型。本研究回顾了多模态深度学习在五个任务上的当前应用:(1)报告生成,(2)视觉问答,(3)跨模态检索,(4)计算机辅助诊断,以及(5)语义分割。我们的结果突出了MDL的多样应用和潜力,并为该领域的未来研究指明了方向。我们希望我们的综述将促进自然语言处理(NLP)和医学成像社区的合作,并支持下一代决策和计算机辅助诊断系统的开发。