Yang Lian, Wan Yiliang, Pan Feng
Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue 1277, Wuhan, 430022, China.
Hubei Province Key Laboratory of Molecular Imaging, Wuhan, 430000, China.
J Imaging Inform Med. 2025 Feb 19. doi: 10.1007/s10278-025-01446-1.
The rapid advancements of deep learning technology have revolutionized medical imaging diagnosis. However, training these models is often challenged by label imbalance and the scarcity of certain diseases. Most models fail to recognize multiple coexisting diseases, which are common in real-world clinical scenarios. Moreover, most radiological models rely solely on image data, which contrasts with radiologists' comprehensive approach, incorporating both images and other clinical information such as clinical history and laboratory results. In this study, we introduce a Multimodal Chest X-ray Network (MCX-Net) that integrates chest X-ray images and clinical history texts for multi-label disease diagnosis. This integration is achieved by combining a pretrained text encoder, a pretrained image encoder, and a pretrained image-text cross-modal encoder, fine-tuned on the public MIMIC-CXR-JPG dataset, to diagnose 13 diverse lung diseases on chest X-rays. As a result, MCX-Net achieved the highest macro AUROC of 0.816 on the test set, significantly outperforming unimodal baselines such as ViT-base and ResNet152, which scored 0.747 and 0.749, respectively (p < 0.001). This multimodal approach represents a substantial advancement over existing image-based deep-learning diagnostic systems for chest X-rays.
深度学习技术的快速发展彻底改变了医学影像诊断。然而,训练这些模型常常受到标签不平衡和某些疾病样本稀缺的挑战。大多数模型无法识别多种并存的疾病,而这在现实世界的临床场景中很常见。此外,大多数放射学模型仅依赖图像数据,这与放射科医生综合考虑图像以及其他临床信息(如临床病史和实验室检查结果)的全面方法形成对比。在本研究中,我们引入了一种多模态胸部X线网络(MCX-Net),该网络整合胸部X线图像和临床病史文本以进行多标签疾病诊断。这种整合是通过结合一个预训练的文本编码器、一个预训练的图像编码器和一个预训练的图像-文本跨模态编码器来实现的,这些编码器在公开的MIMIC-CXR-JPG数据集上进行微调,以诊断胸部X线片上的13种不同肺部疾病。结果,MCX-Net在测试集上实现了0.816的最高宏平均受试者工作特征曲线下面积(macro AUROC),显著优于单模态基线模型,如ViT-base和ResNet152,它们的得分分别为0.747和0.749(p < 0.001)。这种多模态方法相对于现有的基于图像的胸部X线深度学习诊断系统有了实质性的进步。