Fan Yong, Zhang Zhengbo, Wang Jing
Medical Innovation Research Department, Chinese PLA General Hospital, Beijing 100853, P. R. China.
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, P. R. China.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Oct 25;41(5):1062-1071. doi: 10.7507/1001-5515.202310011.
Currently, the development of deep learning-based multimodal learning is advancing rapidly, and is widely used in the field of artificial intelligence-generated content, such as image-text conversion and image-text generation. Electronic health records are digital information such as numbers, charts, and texts generated by medical staff using information systems in the process of medical activities. The multimodal fusion method of electronic health records based on deep learning can assist medical staff in the medical field to comprehensively analyze a large number of medical multimodal data generated in the process of diagnosis and treatment, thereby achieving accurate diagnosis and timely intervention for patients. In this article, we firstly introduce the methods and development trends of deep learning-based multimodal data fusion. Secondly, we summarize and compare the fusion of structured electronic medical records with other medical data such as images and texts, focusing on the clinical application types, sample sizes, and the fusion methods involved in the research. Through the analysis and summary of the literature, the deep learning methods for fusion of different medical modal data are as follows: first, selecting the appropriate pre-trained model according to the data modality for feature representation and post-fusion, and secondly, fusing based on the attention mechanism. Lastly, the difficulties encountered in multimodal medical data fusion and its developmental directions, including modeling methods, evaluation and application of models, are discussed. Through this review article, we expect to provide reference information for the establishment of models that can comprehensively utilize various modal medical data.
目前,基于深度学习的多模态学习发展迅速,并广泛应用于人工智能生成内容领域,如图像-文本转换和图像-文本生成。电子健康记录是医务人员在医疗活动过程中使用信息系统生成的数字信息,如数字、图表和文本。基于深度学习的电子健康记录多模态融合方法可以辅助医疗领域的医务人员全面分析诊疗过程中产生的大量医疗多模态数据,从而实现对患者的准确诊断和及时干预。在本文中,我们首先介绍基于深度学习的多模态数据融合方法和发展趋势。其次,我们总结并比较结构化电子病历与图像和文本等其他医疗数据的融合,重点关注临床应用类型、样本量以及研究所涉及的融合方法。通过对文献的分析和总结,不同医疗模态数据融合的深度学习方法如下:一是根据数据模态选择合适的预训练模型进行特征表示和后期融合,二是基于注意力机制进行融合。最后,讨论了多模态医疗数据融合中遇到的困难及其发展方向,包括建模方法、模型评估和应用。通过这篇综述文章,我们期望为建立能够综合利用各种模态医疗数据的模型提供参考信息。