Xing Suxia, Fang Junze, Ju Zihan, Guo Zheng, Wang Yu
School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):60-69. doi: 10.7507/1001-5515.202304001.
The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.
医学图像报告的自动生成任务面临着各种挑战,例如疾病类型多样以及报告描述缺乏专业性和流畅性。为了解决这些问题,本文提出了一种基于内存驱动方法的多模态医学成像报告(mMIRmd)。首先,使用基于移位窗口的分层视觉Transformer(Swin-Transformer)来提取患者医学图像的多视角视觉特征,并使用来自Transformer的双向编码器表示(BERT)提取文本病史信息的语义特征。随后,将视觉和语义特征进行整合,以增强模型识别不同疾病类型的能力。此外,使用医学文本预训练词向量字典对视觉特征标签进行编码,从而提高生成报告的专业性。最后,在解码器中引入内存驱动模块,解决医学图像数据中的长距离依赖问题。本研究在印第安纳大学收集的胸部X线数据集(IU X-Ray)以及麻省理工学院和麻省总医院发布的重症监护胸部X线医学信息集市(MIMIC-CXR)上进行了验证。实验结果表明,所提出的方法能够更好地聚焦于受影响区域,提高报告生成的准确性和流畅性,并协助放射科医生快速完成医学图像报告的撰写。