New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE.
Sci Rep. 2024 Sep 28;14(1):22516. doi: 10.1038/s41598-024-74043-x.
Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. Although such approaches deliver promising results, they do not take advantage of the associated patient or scan information collected within Electronic Health Records (EHR). This study aims to develop a multimodal pretraining approach for chest radiographs that considers EHR data incorporation as an additional modality that during training. We propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest radiograph representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate the multimodal MSN on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations through linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. In particular, our proposed method achieves an improvement of of 2% in the Area Under the Receiver Operating Characteristic Curve (AUROC) compared to vanilla MSN and 5% to 8% compared to other baselines, including uni-modal ones. Furthermore, our findings reveal that demographic features provide the most significant performance improvement. Our work highlights the potential of EHR-enhanced self-supervised pretraining for medical imaging and opens opportunities for future research to address limitations in existing representation learning methods for other medical imaging modalities, such as neuro-, ophthalmic, and sonar imaging.
基于自我监督学习的医学图像方法主要依赖于预训练时的成像模式。虽然这些方法取得了有前景的成果,但它们并没有利用电子健康记录(EHR)中收集的相关患者或扫描信息。本研究旨在为胸部 X 光片开发一种多模态预训练方法,将 EHR 数据纳入作为训练期间的附加模式。我们建议在自我监督预训练中使用掩蔽孪生网络(MSN)结合 EHR 数据,以提高胸部 X 光片表示的质量。我们研究了三种 EHR 数据类型,包括人口统计学数据、扫描元数据和住院信息。我们使用两种视觉转换器(ViT)骨干网络,即 ViT-Tiny 和 ViT-Small,在三个公开的胸部 X 射线数据集 MIMIC-CXR、CheXpert 和 NIH-14 上评估多模态 MSN。通过线性评估评估表示的质量,我们提出的方法与香草 MSN 和最先进的自我监督学习基线相比,表现出显著的改进。特别是,与香草 MSN 相比,我们提出的方法在接收器操作特征曲线下面积(AUROC)方面提高了 2%,与包括单模态在内的其他基线相比提高了 5%至 8%。此外,我们的研究结果表明,人口统计学特征提供了最大的性能提升。我们的工作强调了 EHR 增强的自我监督预训练在医学成像中的潜力,并为未来的研究提供了机会,以解决其他医学成像模式(如神经、眼科和超声成像)中现有表示学习方法的局限性。